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Author  :  Raymond  Honf  u  Chan 
Advisor  :  Olof  B.  Widlund 

Markovian  queueing  networks  having  overflow  capacity  are  discussed. 
The  Kolmogorov  balaiux  equations  result  in  a  linear  homogeneous  system, 
where  the  right  null-vector  is  the  steady-state  probability  distribution  for  the 
network.  The  dimension  of  the  system  is  equal  to  the  total  number  of  states 
in  the  networks  which  is  of  the  same  order  of  magnitude  as  the  product  of  all 
the  queue  sizes.  Thus  it  is  not  uncommon  that  the  order  of  the  matrix  exceeds 
10,000.  Preconditioned  conjugate  gradient  methods  are  employed  to  find  the 
null-vector.  The  preconditioner  is  a  singular  matrix  of  the  same  order  which 
can  be  handled  by  separation  of  variables.  The  resulting  preconditioned 
system  is  nonsingular.  Since  the  states  remaining  are  precisely  those  at  which 
interactions  between  the  queues  take  place,  the  dimension  of  this 
preconditioned  system  can  usuaUy  be  reduced  by  an  order  n,  where  n  is  the 
individual  queue  size.  Numerical  results  show  that  the  number  of  iterations 
required  for  convergence  is  roughly  constant  when  n  increases.  Analytic 
results  are  given  to  explain  this  fast  convergence. 
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Section  1.  Introdnction 


In  Markovian  queueing  networks,  most  of  the  quantities  of  interest,  for  example  the 
blocking  probability  and  the  waiting  time  for  customers  in  various  queues,  can  be  expressed 
in  terms  of  the  steady-state  probability  distributions,  which  are  the  solutions  of  the 
Kolmogorov  balance  equations.  The  resulting  matrix  system  has  dimension  N,  where  N  is 
the  total  number  of  states  in  the  network.  Tliis  matrix  has  positive  diagonal  elements  and 
non-positive  off-diagonal  elements.  The  graph  of  such  a  matrix  derived  from  a  ^-queue 
model  is  the  same  as  that  of  a  ^-dimensional  discrete  Lapladan.  The  matrix  is  non-symmetric 
and  is  known  to  have  a  one  dimensional  null-space.  The  steady-state  probability  distribution 
is  the  normalized  right  null- vector  of  this  matrix. 

However,  even  for  systems  with  relatively  small  numbers  of  queues,  say  4,  and  a  small 
number  of  waiting  spaces  per  queue,  say  20,  the  order  N  of  the  matrix  can  be  huge,  in  this 
case  it  is  16,000.  Hence  the  matrix  equations  are  rarely  solved  by  direct  methods  such  as 
Gaussian  elimination.  Kaufman  [25]  has  considered  a  special  direct  method  which  handles 
one  or  serveral  large  submatrices  by  separation  of  variables,  while  the  remaining  variables 
are  handled  by  a  conventional  Gaussian  elimination  method.  Different  classical  iterative 
methods,  for  example,  the  point  SOR  and  the  block  SOR  methods,  are  also  discussed  there. 
In  this  paper,  preconditioned  conjugate  gradient  methods  are  empolyed.  The  preconditioner 
is  a  singular  matrix  of  order  N  which  can  be  handled  by  separation  of  variables.  Although  the 
orginal  matrix  is  singular,  we  can  reduce  the  problem  to  solving  a  non  singular 
inhomogeneous  system  by  computing  the  component  of  the  eigenvector  which  is  jrthogonal 
to  the  null-space  of  this  chosen  separable  problem.  Since  the  states  remaining  are  precisely 
those  at  which  interactions  between  the  queues  take  place,  the  effective  dimension  of  the 


problem  can  usually  be  reduced  by  an  order  n,  where  n  is  the  individual  queue  size. 

In  section  2,  we  first  discuss  how  the  balance  equations  are  generated  for  different 
overflow  queucing  models,  and  the  properties  of  the  resulting  matrices.  We  then  discuss  how 
to  change  the  problems  of  finding  the  null-vectors  of  these  matrices,  whidi  are  of  order 
iV  =  n«,  into  linear  inhomogeneous  systems  of  order  n«~^.  Preconditioned  conjugate  gradient 
type  iterative  methods  are  then  introduced.  We  discuss  how  to  implement  Aese  iterative 
methods  to  solve  the  linear  systems. 

In  section  3,  we  analyze  the  convergence  rate  of  the  method  for  systems  with  very  large 
queue  size.  The  convergence  rate  depends  on  the  spectrum  of  the  iteration  matrices.  We  find 
that  the  eigenvalues  of  these  matrices  are  clustered  with  only  a  few  outlying  eigenvalues. 
From  these  results,  the  fast  convergence  of  the  method  follows. 

In  section  4,  we  report  on  the  numerical  res'ults  for  the  models  discussed  in  section  2.  A 
comparison  is  made  between  this  method  and  the  point  SOR  method.  We  see  that  our 
method  has  a  much  better  performance.  In  fact,  the  number  of  iteration  required  to  attain  a 
given  accmacy  is  almost  constant  independent  of  the  queue  size. 


Section  2.  The  Equations  and  The  Methods 


A  Markovian  analysis  of  a  queueing  network  based  on  solving  the  Kolmogorov 
equations  (see  Neuts  [30])  for  the  steady-state  probability  distribution  involves  finding  the 
ntill-vector  of  a  large,  sparse  structured  matrix.  In  this  section,  we  will  derive  these 
Kolmogorov  equations,  construct  such  matrices  and  introduce  the  iterative  methods  used  in 
finding  the  null-vectors  for  such  matrices. 

Let  us  first  introduce  the  notations  that  we  will  be  using.  Assume  that  the  network  has 
q  queues  recdving  customers  from  q  independent  Poisson  soxirces,  see  Kleinrock  [23].  In  the 
j-th  queue  there  are  *,  parallel  servers,  and  «,-*;—!  waiting  spaces.  Customers  enter  the 
queue  with  mean  arrival  rate  X,  >  0.  The  departure  distribution  is  independent  and 
exponential  with  mean  rate  jt,  >  0.  Let  p,  j  j  denote  the  steady-state  probability 
distribution  which  gives  the  probability  of  state  (i^,...,i^),  i.e.,  the  probability  that  i, 
customers  are  in  they-th  queue,  j=l,..,q.  Since  0^ij<nj,  l:sj^q,  the  total  number  of  states 

in  the  system  is  A^  =  [[nj. 
7-1 

It  is  important  to  note  that,  in  general,  there  is  no  need  to  obtain  the  probabilities  of  all 

the  states  in  the  system.  They  may  be  used  however  to  compute  such  quantities  as  the 

blocking  probability,  the  probability  of  overflow  from  one  queue  to  another,  or  the  average 

waiting  time  of  customers  in  various  queues. 

We  begin  with  a  simple  problem,  which  gives  the  idea  of  how  the  balance  equations  and 
the  corresponding  matrix  are  generated.  This  problem  is  sepvable  and  can  be  solved  easily. 
It  will  be  used  as  the  preconditioner  for  more  complicated  models. 


I  2.1        The  Free  Model 

For  the  problems  which  we  will  discuss  in  this  paper,  the  balance  equations  are 
generated  by  considering  die  rate  at  which  a  state  is  left  and  the  rate  and  state  from  whidi 
that  state  is  entered.  For  example,  let  us  consider  a  two-queue  network  with  no  interaction 
between  the  queues.  In  particular,  customers  going  into  a  full  queue  are  lost.  If  B^  is  the 
Kronecker  delta,  then  the  balance  eqxiations  are 

|Xi(l-8j„^_i)  +  X2(l-8y„,-i)  +  »timin(i,Ji)  +  |t2minO.*2)U/,/ 

=  Xi(l-8,o)p,-v  +  >fc)(l-8ta^-i)niin(«+l,Ji)p,+i^  (2.1.1) 


+  X2(l-8yo)Pv-i+  M^(l-8y«,-i)minO  +  l,*2)/'/^+ 


i» 


for  0^i<n^,  Osj</i2.  The  left  hand  side  of  (2.1.1)  indicates  the  rate  at  which  state  (jj)  is 
left  and  the  right  hand  side  indicates  from  which  states  and  the  rate  at  which  state  (ij)  is 
entered.  If  we  write  the  steady-state  probability  distribution  as 

P0-(P0fi  »  P0,i  .  •  •  •  <  Po^j-l  •  ^1,0 Pl/>i-i.  •  •      ■  •  P/ij- 1^,-1  )   » 

where  •  denotes  the  transposition,  then  (2.1.1)  can  be  written  as  Dp^  =  CpQ  where  D  is 
diagonal  and  C  has  zero  diagonal  entries  but  non-positive  off-diagonal  entries.  Let  Ao"»Z?-C. 
Then  the  steady-state  probability  distribution  is  just  the  right  null-vector  of  Aq,  i.e., 

AqPo  =  0,  (2.1.2) 

where  Aq  is  of  order  N,  N=n  jt2.  ^ce  Po'm  probability  distribution,  we  require 

2  2^/^  =  1.  (2.1.3) 

y-0  /-o 

P,j  ^  0.  (2.1.4) 

We  will  see  that  these  constraints  will  uniquely  determine  p^. 
From  (2.1.1),  we  see  that  Aq  is  separable,  in  fact 


5- 


Ao  =  Gi  ®  /„,  +  /„^  ®  G:, 


(2.1.5) 


where    G,  = 


-X,  X,+  M,,     -2m., 

-X,     X,+2m./   -3m., 


0 


-X,     X,+*,M.,    -JjM./ 


0 


-X,     X,+j,M.,    -J/M-/ 


(2.1.6) 


are  matrices  of  order  n,,  and  I^  is  an  identity  matrix  of  order  k.  Unless  ambiguity  arise,  we 
will  drop  the  subscript  for  /.  Notice  that  the  graph  of  Aq  is  the  same  as  the  graph  of  the 
discrete  Lapladan  on  a  square  with  mesh  size  (n,  — 1)"^.  We  claim  that  the  G,  and  Aq  have 
one  dimensional  null-spaces.  In  fact  we  have  (see  Berman  and  Plemmons  [5] ) 

LEMMA  2.1.1 

An  irreducible  matrix  A  with  zero  column  sums,  strictly  positive  diagonal  and  non- 
positive  off-diagonal  entries,  has  a  one  dimensional  null-space.  The  corresponding  null-vector 
can  be  chosen  to  have  positive  entries. 

Proof:  Let  us  write  A  =  D  -  C,  where  D  is  the  diagonal  matrix  containing  all  the 

diagonal  entries  of  A.  We  have  Djj  >  0  and  Cji,  &  0.  Since  diagonal  entries  do  not  affect  the 
connectivity  of  the  underlying  graph  of  A,  C  and  hence  C  D~^  are  irreducible.  By  the  facts 
that  A  has  column  sums  equal  to  zero  and  the  entries  of  C  D~^  are  all  non-negative, 
(CD~^)*1=1,  where  1  is  the  vector  of  all  ones.  Hence  ||  (C  D~^)*  ||  ,  =  1.  Since 
II  (CZ?-^)*  II,  a:  p((CD-T)  =  p(C/)"0,  we  find  that  l&p(CZ)-^).  But  1  is  an 
eigenvalue  of  (CD~^)*,  hence  p(CD"^)  =  l.  By  Perron-Frobcnius  theory  on  nonnegative 
irreducible  matrices,  see  Varga  [40],  p(C  D~^)  =  1  is  a  simple  eigenvalue  of  C  D~^  and  the 
corresponding  eigenvector  has  positive  entries.  Hence  A  has  a  one  dimensional  null-space 
with  positive  null-vector,     a 


Qearly  G,  wtisfies  the  assumptions  of  the  lemma.  Thus  it  has  a  one  dimensional  null- 
space  with  positive  null-vector  and,  by  (2.1.5),  so  does  Aq.  Hence  the  solution  pq  to  (2.1.2)  - 
(2.1.4)  exists  and  is  unique.  In  particular,  the  positivity  constraints  (2.1.4)  can  always  be 
satisfied.  We  remark  that  by  the  Gerschgorin  theorrai:  (sec  Varga  [40]),  except  the  zero 
eigenvalue,  all  the  other  eigenvalues  of  the  G,  and  hence  those  of  Aq  are  positive.  In  view  of 
(2.1.5),  the  null-vector  pq  of  A o  can  be  expressed  in  term  of  the  null-vector  g,  of  G,  as 
Pq  =  g^  ®  g^.  To  find  the  null-vector  of  G„  we  first  notice  that  G,  can  be  symmetrized  by  a 
diagonal  matrix.  More  precisely,  if  we  define  S,  =  diag  ( 'dy,  .  .  .  ,'d„^  ),  i=l,2,  with 


U  = 


t^l^i^^'  ^<>-''" 


(2.1.7) 


1  ;=i. 

then  5f  ^  Gt  S,  is  symmetric.  Here  a,  is  the  normalization  constant  such  that 

i;  Sf  1,  =  ^('djf  =  1,  (2.1.8) 

where  1,  denotes  the  n,-vector  of  all  ones.   Since  C,  have  zero  column  sums,  i.e., 

1,*  G,  =  0,  (2.1.9) 

we  have, 

G,5?l,  =  5/g;i,  =  0.  (2.1.10) 

Thus  Sj  1/  is  the  null-vector  for  G,.  Hence, 

Po  =  52  1  =  (5i  ®  SjY  (li  ®   1;).  (2.1.11) 

By  (2.1.8),  Pq  also  satisfies  the  summation  constraint  (2.1.3). 

We  remark  that  the  operator  Aq  has  its  analogy  in  the  continuous  case.  It  resembles  the 
finite  difference  approximation  to  a  variable  coefficient  elliptic  operator  with  a  transport  term 
acting  on  a  rectangular  region  with  Neumann  boundary  conditions  on  every  sides.  A  simple 
way  to  see  this  is  to  expand  pij  in  (2.1.1)  formally  in  Taylor's  series  with  mesh  sizes 


(n,-l)"^  and  (ny-l)~^.  We  also  note  that  if  X,  =  p.,  =  j,  =  1  in  (2.1.6),  then  the  resulting 
Aq  is  just  the  usual  5-points  difference  operator  for  ths  Lapladan  equation  on  a  same  region 
with  Neumann  boundary  conditions  on  every  sides.  Thus  the  separability  of  Aq  and  an 
analytic  solution  as  in  (2.1.11)  are  expected.  We  will  return  to  this  analogy  in  9  2.3. 

The  matrix  Aq,  ±ough  singular,  will  be  used  as  the  preconditioner  for  more  complicated 
models.  It  is  thus  necessary  to  define  an  appropriate  inverse  of  Aq.  We  proceed  by  first 
obtaining  a  spectral  decomposition  of  Aq.  Since  5f^  G,  S/  is  symmetric,  there  exists 
orthogonal  matrices  Q,  and  diagonal  matrices  V,,  i'=l,2,  such  that 

C;  Sr^  G,  S,  Qi  =  r„  i=l,2.  (2.1.12) 

Here  F,  =  diag('y,  ^  ,  7,^,  ....  y,^^  contains  the  eigenvalues  {"iijyjLi  of  G,.  By  lemma 
2.1.1,  each  G;  has  only  one  zero  eigenvalue.  Let  us  set  -y,^  =0.  By  (2.1.5),  Aq  can  then  be 
diagonalized  by  5^  Ci  ®  52  Cj.  More  precisely, 

(G:  S:^  ®  Ql  S{^)  Ao  (5i  C:  ®  52  G2)  =  (  Fi  ®  /  +  /  ®  Fj )  -  2,        (2.1.13) 
where 

2  =  diag  (Di ,  Zj 2,^), 

with 

Ij  =  diag  (7v+72,i Iij+ll^^  =  ^2  +  lij  ■  /,  l^j^n^.        (2.1.14) 

Here  2  is  diagonal  and  contains  all  the  eigenvalues  of  Aq.  Since  only  7^^  =  ^jji  =  0.  oaly 
the  last  block  2„  =  F2  is  singular.  From  this  spectral  decomposition  of  Aq,  we  see  that 
^'^  -  {Poi  ®  ^"'(■Aq).  where  /m(Ao)  is  the  range  of  Aq. 

Though  2„  is  singular,  it  is  diagonal.  Thus  it  is  easy  to  define  its  generalized  inverse,  or 
the  {l}-inverse,  2*;  see  Ben-Isreal  and  Greville  [4].  In  fact, 

2„^  -  r^  =  diag  (72-i 72-^-1 ,  -r).  (2.1.15) 

with  7  defined  arbitrarily.  Since  2y^  is  well-defined  for  l^j<n2,  the  generalized  inverse  2* 


of  1  is  given  by 

2*  =  diag(2ri 2-ii.2„> 

By  (2.1.13),  the  generalized  inverse  Aq  of  Aq  is  thus  given  by 

A^  =  (5:  G:  ®  5^  G^)  2*  (Gl  Sf^  ®  0:  ^2'')-  (2116) 

It  is  important  to  note  that  Aq    is  invertible  on  /m(Ao).   More  precisely,  for  all 

X  €  /m(i4o),  there  exists  a  unique  y  i  /m(Ao)  such  that  Aq  y  =  x.  In  fact,  y  =  Aq  jc  and  we 

have  y  =  AqA^  y  =  Aq  Aq  y.   This  follows  from  the  fart  that  y  (.  Im{AQ)  if  and  only  if  the 

last  entry  of  (Gi^Sf^  ®  Ql  82^)  y  iizcTO. 

The  generalization  to  ^ -queue  free  models  is  immediate.  By  free  models,  we  mean  that 
there  is  no  interaction  between  different  queues.  More  precisely,  customers  entering  a  queue 
will  be  blocked  and  lost  if  that  queue  is  full.  Thus  the  queues  are  totally  separated  from  each 
other.  We  summarize  all  the  results  in  the  following  lemma.  The  proof  is  similar  to  the  2' 

queue  case,  and  will  not  be  given.    For  simplicity,  let  us  denote    ®  £,  ■  f^  ®   •  •  •  ®  E^ 

and  Ef  ■  /„ ,  for  any  arbitrary  sequence  of  matrices  {  E,  }f.^,  with  respective  dimension  n,. 

LEMMA  2.1.2 

The  following  statements  hold  for  arbitrary  ^-queue  free  models. 

(i)  The  generating  matrix  Aq  is  of  order  N  =  {[n,,  and  is  given  by 

/-I 

Ao=i  ^G,\  (2.1.17) 

where  G,   are  given  by   (2.1.5)   and  8,y   is  the  Kronecker  delta.    Thus  Aq  satisfies  the 
assumptions  in  lemma  2.1.1. 

(ii)  Aq  has  the  following  spectral  decomposition 

Q'S-^AqSQ  =  1,  (2.1.18) 


where  G  -  ^  C?/.  ■S  ■  &  5,  and  2  -  2)  ^  rf".  Here  (?,  and  P,  are  defined  by  (2.1.12)  and 

/"I  ("1  JmX  '"i 

5,  are  defined  by  (2.1.7)  and  (2.1.8). 

(iii)  The  steady-state  probability  distribution  is  given  by 

Po  =  S^  1.  (2.1.19) 

where  1  =  (1,1 1)  €  Z?"^'. 

(iv)  From  the  spectral  decomposition  (ii),  we  have 

R^'^iPoyeimiAo),  (2.1.20) 

where 

7m(/lo)  =  {  X  ^  /?^'  II*  X  =  0  }.  (2.1.21) 

(v)  The  generalized  inverse  (  {l}-inverse  )  of  Ag  is  given  by 

A^  =SQ  2*  Q'  S-K  (2.1.22) 

where 

r(2„)-i    i<N, 
(2*)«  =  (arbitrary  i^S.  (2.1.23) 

(vi)  Aq   is  invertible  on  /m(i4o).  More  precisely,  given  any  x  €  /^(Aq),  there  exists  a  unique 
y  €  /m(i4o)  such  that  Aq  y  =  x  with  >  =  Aq  x.  We  thus  have 

y  =  AQA;y  =  a;  Aoy  for  all  ^  6  /m(Ao).  (2.1.24) 

(vi)  For  any  p  C  R^ ,  there  exists  unique  scalar  a  and  5  €  /'"(Aq),  such  that 

p  =  apo  +  A^  i.  a  (2.1.25) 

Beginning  in  9  2.3,  we  will  discuss  models  having  overflow  capacity.  These  models  lead 
to  problems  that  do  not  have  analytic  solutions.  Thus  the  balance  equations  are  usually  solved 
by  numerical  methods.  The  iterative  methods  introduced  in  the  next  section  will  be  used  to 
solve  linear  systems  arising  from  these  overflow  queueing  networks. 
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i  2.2        Preconditioned  Coi^agatc  Gradient  Methods 

Given  a  symmetric,  positive  deHnite  matrix  B  of  order  m,  a  very  attractive  iterative 
scheme  for  the  solution  of  the  linear  system  Bx  =  b  is  the  conjugate  gradient  method.  For  an 
arbitrary  initial  vector  xq,  the  method  generates  a  sequence  of  approximations  {xj}  to  the 
solution  X  defmed  by 

Xj,,  =  X,  +  ajdj,  aj  =  ^^i^. 

rj,,  =  rj-  ajBdj,  ^j  =   <^y^i'^/^i>: ,  (2.2.1) 

where  dQ  =  r^  =  b-Bx^,  and  <  •  ,  •  >2  is  the  /i  inner  product  in  /?"*.  The  method  was  first 
discussed  by  Hcstenes  and  Stiefel  [20],  see  also  Luenberger  [28].  The  y-th  iterant  xj 
minimizes  the  error  functional 

II  x-xj  lU  -  <  x-xj  ,  B{x-xj)  >'}  (2.2.2) 

over    the    Krylov    space   Xg  +  Span  <rQ  ,  Br^  ,  .  .  .  ,  B^'^  ro>.    By    expanding  Xj,    b    and 

X  =  fl"-i  in  the  eigenvectors  of  fl,  we  have 

^ — ^  s   min    max     1  -X  P{\)    ;  (2.2.3) 

Ik-Xollfl  P^.P,.,^i^yB) 

see  Luenberger  [28].  Here  Pj,.]^  is  the  space  of  all  polynomials  of  degree  k- 1  and  cr(5)  is  the 
spectrum  of  B.  Thus  in  exact  arithmetic,  the  solution  can  be  obtained  within  m  step.  In 
practice,  if  B  has  a  good  spectrum,  for  example  a  moderate  condition  number  or  clustered 
eigenvalues,  then  the  method  will  give  a  very  accurate  result  in  a  few  steps.  An  example  is 
when  B  is  of  the  form  w/  +  L  +  C/,  where  u  is  a  scalar,  L  is  a  matrix  of  low  rank  and  t/  is  a 
matrix  of  small  norm.  An  approximate  upper  boimd  on  the  number  of  iterations  required  to 
make  the  relative  error  |lx-JCy||  g  I  I^x-xqH  5  s  e  is 


%hi(^) 


^^^),  (2.2.4) 
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where  k(5)  is  the  condition  number  of  B.  From  (2.2.1),  we  see  that  this  method  requires  5m 
operations  plus  one  matrix-vector  multiplication  per  iteration.  Since  B  is  required  only  in  the 
form  Bd,  no  explicit  knowledge  of  the  matrix  elements  of  B  is  required. 

When  the  spectrum  of  B  is  not  favorable,  we  can  accelerate  the  rate  of  convergence  by 
preconditioning  the  system.  Given  any  splitting  B  =  Bq  +  R  of  B,  where  Bq  is  symmetric  and 
positive  definite,  we  can  apply  the  conjugate  gradient  method  to  the  transformed, 
preconditioned  system  B  Bq^  y  =  b  with  x  =  Bq^  y.  We  remark  that  the  resulting  algorithm 
is  equivalent  to  applying  the  ordinary  conjugate  gradient  method  to  the  symmetric  matrix 
Bq'*  B  Bq'*  but  using  a  different  basis  for  the  given  space.  In  fact,  replacing  B  and  every 
vector  d  in  (2.2.1)  by  Bq'*  B  Bq"*  and  B^  d,  we  get  the  same  preconditioned  algorithm. 
Notice  that  B  Bq-  =  I  +  R  Bq^,  thus  we  can  take  advantage  of  the  sparsity  in  R  when  we 
compute  R  B^-  d.  We  remark  that  we  can  also  precondition  the  system  from  the  left,  i.e.,  we 
solve  for  x  in  B^^  B  x  =  Bq^  b.  However,  in  each  step,  we  have  to  compute  Bq"  R  d  which, 
in  general,  is  not  sparse.  The  choice  of  flg  depends  very  much  on  the  problem  itself.  Since  in 
each  iteration,  we  have  to  compute  a  vector  of  the  form  Bq  '■  d,  it  is  necessary  that  the  system 
Bqx  =  d  can  be  solved  economically.  To  speed  up  the  convergence,  Bq'-  should  also  be 
chosen  to  be  an  approximate  inverse  of  B.  Typically  we  want  Bq'-  B  to  be  of  the  form 
(dI  +  L  +  U.  An  analysis  of  this  technique  is  given  in  Hestenes  [21];  see  also  Concus, 
Golub  and  O'Leary  [11].  For  the  generalization  to  positive  semi-defmite  system,  see  Lewis 
and  Rehm  [26]. 

When  B  is  nonsymmetric,  we  may  observe  that  fl*  fl  is  positive  definite,  symmetric  and 
solve  the  normal  equation  B'  B  x  =  B'  b  instead.  Since  k(B'  B)  =  k'(B),  (2.2.4)  suggests 
that  if  5  is  poorly  conditioned,  then  the  convergence  may  be  very  slow.  This  leads  us  to 
consider  other  conjugate  gradient  type  methods.  One  of  them  is  the  Orthodir  method 
discussed  by  Yoxmg  and  Jea  [41];  see  also  Elman  [15].  Given  "in  arbitrary  initial  vector  Iq, 
we  generate  a  sequence  {xj}  of  approximations  to  the  solution  x,  by 


,  .. .  ,        -12- 

jj,,  =  xj  +  ajdj,  ""J  -  TSdJ^Bd;^' 

rj+i  =  rj-ajBdj,  (2.2.5) 

J.  -<B^d,^dL>, 

Here  fq  =  d^  =  b  —  Bxq.  In  this  algorithm,  we  do  not  need  to  computs  B'y,  but  need  extra 
work  and  storage  for  ^j^^  and  d^.  At  each  step,  the  method  chooses  the  xj  that  minimizes  the 
residual  norm  ||  fl  xy  -  t  ||  2  over  the  same  Krylov  space  xq  +  Span<ro ,  BrQ  ,  .  .  .  ,  B^"Vo> 
that  was  considered  previously. 

The  reason  that  we  choose  Orthodir  method  from  among  all  the  other  conjugate  type 
methods  is  that  it  is  guaranted  to  converge  even  if  the  symmetric  part  of  the  iteration  matrix 
is  not  positive-definite.  There  are  no  indications  that  the  iteration  matrices  arising  from 
queueing  networks  enjoy  this  property.  From  (2.2.5),  the  j-th  iteration  requires 
(30  +  l)  +  4)m  operations,  one  matrix-vector  multiplication,  and  (20"+2)-t-2)m  storage  spaces 
respectively.  This  method  is  therefore  practical  when  the  number  of  iterations  required  to 
acquire  a  given  accuracy  remains  small  and  roughly  constant,  independently  of  m.  When  this 
is  not  the  case,  there  are  two  alternative  ways  of  reducing  the  cost  in  each  step,  see  F.lman 
[15].  One  is  to  restart  the  whole  process  after  a  certain  number  of  iterations.  The  other  one 
is  to  keep  only  the  last  few  d^  vectors.  However,  neither  of  these  algorithms  is  guaranteed  to 
converge  for  general  systems. 

Similar  to  the  symmetric  case,  we  can  split  B  =  Bq  +  R  where  Bq  is  nonsingular,  not 
necessarily  symmetric  and  apply  the  Orthodir  method  to  the  preconditioned  matrices 
B  Bq^  =  I  +  R  Bq^  or  Bq^  B  =  I  +  Bq^  R.  These  differ  mainly  in  the  inner  products  and 
the  Krylov  spaces  over  which  we  do  our  minimization.  However,  as  mentioned  above,  if  R 
is  sparse,  then  the  former  requires  much  less  storage  and  possibly  less  work  than  the  latter, 
since  for  any  vector  x,  R  Bq^  x  ii  sparse  while  Bq^  R  x  ii  not. 

Finally  let  us  consider  the  case  where  B  is  nonsymmetric  and  singxilar,  see  Kammerer 
and  Nashed  [22].    We  note  that  by  Fredholm's  alternative,  the  solution  to  B  x  =  b  will  not 
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exist  unless  b  ^  Im(B).  For  symmetric  B,  by  the  spectral  decomposition,  we  know  that  fl^^^^j 
is  invertible  on  Im(B).  This  is  not  true  for  general  nonsymmetric  matrices.  However,  in  9 
2.6.1,  we  will  consider  a  non-symmetric  problem  where  b  €  Im(B)  andB\;„^g^  is  invertible  on 
Im(B).  We  note  that  the  equation  B\j„^g'fX  =  b  can  be  solved  by  the  Orthodir  method 
without  first  finding  B\j„^b^  explicitly.  In  fact,  when  the  initial  iterant  Xq  C  Im(B),  we  have 
r^  =  Jq  =  b  -  fl|/„(5)  xq  =  b  -  B  xq  ^  Im(B).  From  (2.2.5),  it  is  dear  that  rj,  dj  and  Xj  will . 
all  be  in  Im(B)  for  j  >  0.  Thus  there  is  no  need  to  carry  out  the  projection  of  B  onto  Im(B). 
A  suitable  choice  for  Xq  ^  Im{B)  ii  Xq  -  b  or  Xq  =  0. 

In  the  following  two  sub-sections,  we  will  consider  two  examples  which  will  give  insight 
in  how  to  choose  a  suitable  preconditioners  for  the  queueing  problems  that  we  will  discuss 
later. 

9  2.2.1        Preconditioning  Neamann  BVP  by  Dirichlet  BVP 

Let  us  first  consider  the  continuous  case,  see  Bj0rstad  and  Widlund  [7].  Let  (1  be  an 
open  and  bounded  region  in  R'^  with  a  smooth  boundary  dCl.  Let  -q  denote  the  oy'rMaid  unit 
normal  of  CI.  For  any  g  ^  H^dCI),  let  u  be  the  solution  to 

/A  M  =  0  in  n, 

\u  =  g    on  an.  (2.2.6) 

By  the  regularity  theorem  of  elliptic  theory,  sec  Lions  and  Magenes  [27,  p.l89],  u  €  H^{0,). 
For  any  w  6  H^(Cl),  we  have,  by  Green's  theorem, 

/-^H'  =  /wA«    +   JVuVw  =  JVuVw.  (2.2.7) 

Since  u,w^  fi^{Cl),  the  right  hand  side  is  boimded.  Let  7  be  the  trace  operator  on  dCl.  It 
follows  from  Lions  and  Magenes  [27,  p. 39]  that  -y  is  a  continuous  mapping  from  H^{Ci)  onto 

H^dSl).   In  particular,   •yw  €  H^dQ).   Hence   -^  in  (2.2.7)   defines  a  bounded  linear 
functional  on  H'*{dCl).  Thus 


.1/1 


-^  e  H-\dn),  (2.2.8) 


the  dual  space  of  H^dil).  Hence  we  see  that  this  Dirichlet  to  Neumann  map  which  maps  g  to 

—  involves  a  lost  of  a  derivative  in  LjCdft). 
di\ 

We  expect  that  in  a  discrete  case,  such  a  map  will  have  a  spectral  condition  number 
proportional  to  the  number  of  degrees  of  freedom  associated  with  dCl.  The  following 
example  illustrates  this  fact. 

Consider  the  n  by  n  matrices 

t/- tridiag( -1  ,2, -1), 
W-e„  el 

We  note  that  the  eigenpair  {Q  ,  A  }  of  t/  is  given  by 

A;  =  4  sin^(  2^i^),  l^j^n,  (2.2.9) 


and 


Define  B  and  B^  by 


Cy  =  (-ZT)'sin(-^),  \^j,k^n.  (2.2.10) 


flo- t/®/„ +  /„®f/,  (2.2.11) 

5  -  C/®/„  + /„  ®  V.  (2.2.12) 

We  note  that  Bq  is  the  discrete  Lapladan  operator  on  [0,1]^  with  mesh  size  h  =  (n  +  l)~\  and 
with  Dirichlet  boundary  conditions  on  every  sides.  B  is  the  same  operator  on  the  same  region 
but  with  Neimiann  boundary  condition  on  the  side  y  =  I. 

Qearly  B  =  Bq  -  I„®  e„  e'„  and 

(C  ®  G*)  Bo  (G  ®  G)  =  (A  ®  /  +  /  ®  A)  -  2.  (2.2.13) 

Let  us  compute  the  spectrum  of  the  preconditioned  matrix 

Using  (2.2.13),  we  have 
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=  diag(fli fl„),  (2.2.14) 

where 

fly  =  /-27'*C•^^•G27^  (2.2.15) 

2y  =  diag  (Ai  +  Ay A„  +  Ay).  (2.2.16) 

Notice  that  the  second  term  in  (2.2.15)  is  a  rank  one  matrii,  hence  all  except  one  of  the 
eigenvalues  of  Bj  are  equal  to  1.  The  outlying  eigenvalue  is  given  by 

Thus  by  (2.2.14),  we  see  that  all  except  n  of  the  eigenvalues  of  Bq"*  B  Bq'*  are  equal  to  1. 
Those  outlying  eigenvalues  are  given  by  X|(fly),  1  s  y  s  n. 

Using  (2.2.9)  and  (2.2.10),  we  have,  for  1  s  7  s  n, 

r,2                          n                    sin2-^2!_ 
rf   =  y        Q"*        =  /     2     V  y   n  +  1 

^      AA,  +  Ay-^  +  l\-^,4^i^:^:n_^4^^:     *u^         - 

2(71  +  1)  2(n  +  l) 

'^'-^ 
=  (tTt)  «y  2  ^ .  (2.2.17) 


"^1      ^-M-2a,cos^  +  a/' 
.^       n  +  1       •' 


where  Oy  is  the  smallest  root  of 

a;  -  (  2  +  Ay  )  ay  +  1  =  0,  \  ■s  j  s  n. 

Since  the  constant  term  of  this  quadratic  equation  is  1,  we  have, 

fl,  =  1  +  H  A,-(A,  +  A?  /  4)"  =  ^ ; <  1. 

'  ^  ■'  ^  1    +    \4Ay  +  (Ay    +    Ay2/4)^ 

To  evaluate  the  sum  in  (2.2.17),  we  first  define,  for  it  a  0  and  1  s  7  s  n. 


(2.2.18) 


1  -  2ayCOSx  +  aj  ' 


n  -  ifji^)  cos  kx  dx. 

0 


By  applying  the  Poisson  integral  formula  to  the  real  harmonic  function  aj  cos  Jkx,  it  a  0,  we 
have 
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cos  ibr 


0  1  —  2aj  coix  +  aj 


IT  a, 
l-aj 


Using  the  relation 

sin^x  cos  fat   =    — cos  fcc   -    — cos  (ifc+2)x   -    — cos  (m-2)x, 
2  4  4 


we  obtain 


IT 

2 

*=o. 

i' 

''J 

*=1, 

_1 

A 

•n  aj' 

■^(1- 

-/) 

*>1. 

Fi  = 


Thus  the  Poisson  summation  formula,  see  Rudin  [36],  gives 


(2.2.19) 


Vy(iT)  ] 


=  f  a^[n  +  2  2F^,(,,i)] 


i-i 


=   fl/ 


1  -  (a,)2- 


(2.2.20) 


1  -  (a^)^«-2 

We  note  that  aj  in  (2.2.18)  decreases  monotonically  from  1  to  3-2*2  as  Ay  varies  from  0  to 
4.  The  ratio  dj/aj  is  also  monotone  as  a  function  of  Oj,  and 

1  -  -;^°j  >  >'i(Bj)  =  1  -  dj  >  I  -  aj  >  0,        1  s  7  s  n.  (2.2.21) 

We  first  claim  that  the  n  eigenvalues,  ^i{Bj),  1  s  y  s  «,  are  not  clustered.  By  (2.2.18), 
(2.2.9)  and  the  inequality  sin  x  ^  x  for  all  x  2  0,  we  have, 


sin^  .y^..)'* 


2(n  +  l) 


n+1    • 


1  s  7  s  n. 


Thus  (2.2.21)  gives 


^(j+DlT 


^W^^--;;^''j^      nVi  "'         i^;^"-  (2.2.22) 

Hence,  given  any  0  <  8  <  1,  the  number  of  eigenvalues  of  Bq"*  B  Bq"*  lying  outside  the 
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interval  [8,1]  increases  linearly  with  n. 

Next  we  claim  that  the  condition  number  of  Bq'*  B  B^'*  also  increases  linearly  with  n. 
By  (2.2.18),  (2.2.9)  and  the  inequality  sin      j^"^      a  -j^  for  1  s  ;  s  n,  we  have 

Thus  (2.2.21)  gives 

X,(Bj)^l-aj^-^^^^,  l^j^n. 

Hence  the  condition  number  of  Bq"*  3  Bq"*  is  of  the  order  0(n).  From  (2.2.3)  and 
(2.2.4),  we  see  that  the  conjugate  gradient  method  when  applied  to  this  preconditioned 
system  might  converge  slowly  for  some  particular  initial  data  and  right  hand  sides.  This 
example  shows  that  using  a  Neumann  problem  to  precondition  a  Dirichlet  problem  leads  to  a 
non-optimal  method. 

i  2.2.2         Preconditioning  Oblique  BVP  by  Neumann  BVP 

The  queueing  problems  that  we  will  discuss  in  the  next  few  sections  are  very  similar  to 
the  oblique  boundary-value  problems  where  the  direction  vector  -y  of  the  oblique  derivative 
makes  a  constant  angle  with  the  boundary.  Thus  for  simplicity,  let  us  assimie  that 

-,  =  n  -^  T,  (2.2.23) 

where  t  is  the  unit  tangential  vector  along  dfl.  Here  O  is  an  open  bounded  set  in  R^,  with  a 

sufficiently  smooth  boundary  dCl.  In  the  following,  let  C  denote  any  generic  positive  constant 

that  depends  only  on  CI.  Consider  the  following  Hilbert  space 

E'{gi  H-\dn)  \fgdr  =  0},  (2.2.24) 

equipped  with  the  usual  H~\dn)  norm.   For  any  g|  €  E,  let  u^  be  the  solution  to 

A  u  =  0    in  n, 

,  (2.2.25) 

j!L  =  g^on  an, 

[d-r] 
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normalized  by 


/«  =  0. 


(2.2.26) 


By  the  regularity  theorem,  see  Lions  and  Magenes  [27,  p.l89],  u^  €  H^(Ci).  Moreover,  the 
following  a  priori  bound  holds: 


i«Hn) 


C  II  g:  ||«- 


'(30)- 


By  the  trace  theorem,  u^  i  H^dSl)  and 


11-^11 
Thus  by  (2.2.23)  and  (2.2.27), 


=sC|I«, 


iiH*(3n) 


^Clju 


i  iiw'(n)- 


and 


du, 


dy         dr\  dj        *^        dT  ^      ^ 


du 


(2.2.27) 


(2.2.28) 


(2.2.29) 


C7tf  1 

lUan,  ^  II  *i  ll«-(an)  "^  H  "iT  '■''-(an,  ^  ^  ||  g,  ||„.,(,„,.         (2.2.30) 


Qy   iiw-»(an) 


du 


1  . 


Hence  the  mapping  T  which  maps  g^  to  — —  is  bounded.  We  note  that  Tg^  €  £.  In  fact 

/  ^^T  =  /  g,dr  ^f^,r=f^dr=fdu,^0.  (2.2.31) 

an  <''/  an  an  °^  an  <*^  an 

Thus  T  maps  E  into  itself.  We  claim  that  T  is  actually  a  homeomorphism  from  E  onto  itself. 
First  we  prove  that  T  is  one-to-one.  This  follows  form  the  fact  that  the  problem 


A  M  =  0    in  n, 

4^  =  «2  on  an. 


(2.2.32) 


with 


/  K  =  0,  (2.2.33) 

n 

admits  only  ui  "  0  *s  solution  when  gj  =  0.  In  fact,  using  (2.2.32)  £ind  Green's  formula 


/  IVu^P  =  -/  u,  Au2  +  /  u,^dr  =  -  /  u.^dr  =  -'Af  d  u^  =  0.       (2.2.34) 
n  n  an      °^  an      ""^  an 

Thus  uj  "  constant.  By  (2.2.33),  «,  "  0-  Hence  Tg^  =  g:  =  0  implies  g^  =  ——  =  0.  In 

ail 
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particular,  T  is  one-to-one. 

Next  we  prove  that  T  is  onto.  We  first  need  to  determine  the  adjoint  problem  of 
(2.2.32).  By  Green's  formula 

fvAu    -    fuAv   =     fv  -- —   -    /  «  - — 

/du  r       du  r        dv 

V—--    Jv-—    -    Ju  -— - 
an     ^1         an     ^t         f^      dT) 

=    /v|^  -    Juij^-^),  (2.2.35) 

an     ^y         an        ^^        ^^ 

where  the  last  equality  is  obtained  by  integration  by  part.  Thus  we  sec  that  the  adjoint 
problem  of  (2.2.32)  is 

^/  =  '"°"-  (2.2.36) 

-^  =  0  on  an, 

da 
where  the  direction  vector  a  is  given  by 

<y  =  Ti  -  T.  (2.2.37) 

Similar  to  (2.2.34),  we  see  that  the  only  solution  to  (2.2.36)  are  precisely  the  constant 
functions.  Thus  the  compatiblity  condition  for  pjroblem  (2.2.32)  is  the  same  as  that  of 
problem  (2.2.25),  namely 

J  g2d-r  =  0.  (2.2.38) 

an 

Hence,  applying  the  reg\ilarity  theorem  again,  we  have,  for  any  gi  ^  E,  there  exists  a  unique 
Uj  €  H\C1),  which  is  a  solution  to  (2.2.32)  and  (2.2.33).  Moreover,  the  following  a  priori 
bound  holds: 

II  «:  IIhho)  ^  C  II  ^,  ||„-,^,„^.  (2.2.39) 

Since  the  solution  u  to  (2.2.25)  and  (2.2.26)  is  unique,  we  see  that 

T^  =  82-  (2.2.40) 

Thus  T  is  onto. 

Finally  by  (2.2.39)  and  the  trace  theorem,  (see  (2.2.28)) 
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If  =  »:  -  ^  «  «-(a")  (2.2.41) 


and 


II  -^  ll;,-.(,n,  ^  II  *:  "//-van,  ^  "  17  H'^^tao)  ^  ^  "  ^^  "^-(an,-  (2-2-42) 
Hence  T,  the  Neumann  to  oblique  map,  is  a  homeomorphism  from  E  onto  E.  In 
particular,  T  is  well-conditioned.  By  combining  the  results  in  this  sub-section  with  those  in  § 
2.2.1,  it  is  dear  that  the  Dirichlet  to  oblique  map  wiU  involve  a  loss  of  a  derivative  in 
L2idCl).  Hence  we  see  that  the  Neumann  problems  are  better  suited  as  a  preconditioner  for 
the  oblique  derivative  problems  than  the  Dirichlet  problems. 

In  S  3.9,  we  will  consider  a  discrete  version  of  preconditioning  an  oblique  BVP  by  a 
Neumann  BVP  in  a  rectangular  region.  We  will  sec  that  the  singidar  values  of  the 
preconditioned  matrix  are  clustered  around  v2.  Using  (2.2.3),  we  are  then  able  to  prove  the 
optimality  of  the  method. 
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i  2.3         Two-Qoea*  Overflow  Models 

In  this  section,  we  will  discuss  2-queue  networks  with  biiilt-in  overflow  capacity.  We 
assume  here  that  overflow  from  a  queue  are  permitted  only  when  that  queue  is  full.  In  9  2.6, 
we  will  discuss  models  in  which  overflow  can  occur  even  before  the  queue  is  full.  Thus  here 
we  consider  only  two  types  of  overflow  models.  In  one  we  allow  overflow  in  only  one 
direction,  in  the  other  in  both.  The  method  introduced  here  will  be  generalized  to  networks 
with  more  than  two  queues  in  9  2.5.2. 

?  2.3.1  Overflow  In  One  DirectloD 

Let  us  consider  the  model  in  which  overflow  is  permitted  only  from  the  first  queue  into 
the  second.  More  precisely,  customers  entering  the  first  queue  will  wait  and  be  served  by  the 
second  queue  if  all  the  spaces  in  the  first  queue  are  occupied.  On  the  other  hand,  customers 
entering  the  second  queue  are  lost  if  the  second  queue  is  full.  This  model  is  discussed  in 
Kaufman  [25].  The  balance  equations  for  this  model  are  given  by: 

Ui(l-5/„^-i8;„^-i)  +  ^2(l-8y„,-i)  +  Hiniin(/,Ji)  +  ^i2  minO',i,)|p,^ 

=  ^i(l-8;o)/'/-u  +  y-iC^-^in^-J  tmnii+l,sj  pi^^j  (2.3.1) 

+  (^i8ta,-i  +  hX^-^jo)  Pij-i  +  M-2(l-8y,j-L)  minO'  +  l,J2)P/^+i. 

for  OsKrti,  0^j<n2.  This  differs  from  (2.1.1)  only  in  the  coefficients  of  pjj  and  Pij-i. 
The  coefficient  of  Pij-i  indicates  that  customers  are  gained  in  the  second  queue  at  a  rate  X2, 
but  if  the  first  queue  is  full,  additional  customers  will  also  arrive  at  the  second  queue  at  the 
rate  X^. 

Let  p  =  (po,o Pn  -1^  -1)*  be  the  steady-state  probability  distribution  vector  for 

this  problem.  Using  the  notations  in  9  2.1,  we  are  solving  a  homogeneous  system  of  order 
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N  =  n^  n2,  namely, 


Ap  =  (Aq+Rq)  p  =  0, 

Bj— lilj— 1 

j-iy-o 


Pij 


0. 


Here 


with 


Ro  =  i\  '<)  ®  '^1. 


(2.3.2) 
(2.3.3) 
(2.3.4) 

(2.3.5) 


JR,  =  X 


/       **( 


1 
-1    1 


0 


0     -1   1 

-1  0 


(2.3.6) 


a  square  matrix  of  order  rtj.  'ej  denotes  the  j-th  unit  vector  in  /?"'. 

By  (2.3.5)  and  the  fact  that  Aq  satisfies  the  assumptions  of  lemma  2.1.1,  A  also  satisfies 
these  assumptions.  Thus  A  has  a  one  dimensional  null-space  with  a  positive  null-vector. 
Hence  the  solution  p  to  (2.3.2)  -  (2.3.4)  exists  and  is  unique.  We  remark  that  A  and  Aq  have 
the  same  graph.  Since  Aq  and  A  both  have  zero  colimin  sums  and  one  dimensional  null- 
spaces,  it  follows  that 


ImiA)  =  ImiAo)  =  {x  6  ^'^'  |l*  x  =  0}, 


(2.3.7) 


where  1  =  (1,1,. ..,!)*  €  R^'.  Since /Jq  =  A  -  Aq,  /m(/?o)  C  /m(Ao).  By  (2.3.5),  it  is  clear  that 
/m(J?o)  is  the  set  of  aU  x  €  /m(Ao)  with  the  first  (ni-l)n2  entries  equal  to  zero.  More 
precisely,  by  (2.3.7), 


/m(/?o)  =  {x  €  R^'  \x  =  ^e„  <S>y  where  y  €  R"'  with  2>y  =  0  }• 


(2.3.8) 


By  (2.1.25),  there  exists  a  unique  a  and  5o  ^  '"•(•^o)  ^^^  that  p  =  a  p^  +  Aq  Cq.  Since 
A^  ?o  ^  Im(Ao),    we    have,    by    (2.3.7),    1'  p  =  1'  a  pq  +  1'  A^  5o  =  «  1*  Po-    By    the 
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summation  constraints  (2.1.3)  and  (2.3.3),  1'  Po  =  1'  p  =  1.  Thus  a  =  1  and 

P  =  Po  +  ^o"  5o-  (2.3.9) 

Substituting  (2.3.9)  into  (2.3.2)  and  using  (2.1.24),  we  have 

(/  +  /?oAo*)5o=  -^oPo-  (2.3.10) 

By  (2.3.5)  and  (2.1.11),  it  is  easily  checked  that  RqPq  *  0.  Thus  the  problem  of  finding  a 
null- vector  to  (2.3.2)  has  been  transformed  into  the  problem  of  solving  a  linear 
inhomogeneous  system  (2.3.10). 

We  remark  that  (I  +  RqA^)^  =  A  Aq^  for  all  ?  € /m(i4o).  Thus  we  are 
preconditioning  the  equation  (2.3.2)  by  Aq  from  the  right.  Notice  that  the  right  hand  side  of 
(2.3.10)  is  the  product  of  Rq  and  the  null-vector  of  the  preconditioner  Aq.  Thus  it  is 
impossible  to  precondition  the  matrix  A  by  a  non-singular  preconditioner.  If  that  were  done, 
the  right  hand  side  of  the  preconditioned  system  would  be  zero  and  we  would  be  solving  a 
homogeneous  problem  rather  than  an  inhomogeneous  one.  However,  after  fixing  one  degree 
of  freedom  in  the  solution  p ,  it  is  always  possible  to  design  a  non-singular  preconditioner  for 
the  resulting  reduced  system.  This  follows  from  the  fact  that  the  matrix  A  has  a  null-space  of 
dimension  one,  hence  the  resulting  reduced  system  will  no  longer  be  singiilar.  However, 
numerical  results  show  that  such  non-singular  preconditioners  are  in  general  not  as  good  as 
the  singular  one.  We  will  return  to  this  topic  at  the  end  of  this  section  and  in  §  2.4.  Let  us 
remark  that,  although  A  Aq  is  singular,  the  matrix  (I  ■¥  RqAq  )  ii  nonsingular. 

LEMMA  2.3.1 

Consider  a  system  of  the  form 

Ap  =  0 
l'p  =  l 
Pj^O, 

where  A  is  of  the  same  dimension  as  Aq.  If  the  solution  p  exists  and  is  unique  and  1*  A  =  0 
then  the  matrix  (I  +  RqAq  )  is  non-singular,  where  Rq  =  A  -  Aq. 
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Proof:  We  first  note  that  1'  A  =  0  implies  /m(i?o)  C  /'"(Aq).  Hence  (I  +  RqAq  )  maps 
/m(Ao)  into  itself.  Moreover,  the  existence  and  uniqueness  of  p  implies  the  existence  and 
imiqueness  of  a  (q  (  Im(AQ)  that  satisfies 

(I  +  RqAq  )5o  =  ~RoPo- 
Thus  the  matrix  is  invertible  in  /m(Ao).  Suppose  y  is  in  the  kernel  of  this  matrix.  By  (2.1.20), 

there  exists  a  unique  p  and  x  €  /m(i4o)  such  that  y  =  ^  p^  +  x.  Hence  (/  +  ^o-^o^)  y  =  0 

implies  that  -^  Pq=  (/  +  ^o'^o")  ^^  +  fi  Rq^o  Po-  Since  /m(/?o)  C  /'"(-^o),  the  right  hand 

side  is  in  /m(Ao).  Thus  by  (2.1.20),  3  =  0  and  (/  +  RqAq)  x  =0.  Since  x  e  /m(Ao),  the  last 

equation  implies  x  =  0.  Hence  y  =  0.  Thus  the  matrix  is  non-singular,     o 

In  essence,  the  singularity  of  A  has  been  cancelled  by  the  singularity  of  Aq.  We  remark  that  if 
A  is  a  generating  matrix,  then  by  the  conservation  of  flow  in  and  out  of  any  given  state,  the 
assumption  1*  A  =  0  always  holds,  see  Messay  [29]. 

By  this  lemma,  it  is  legitimate  to  solve  the  inhomogeneous  system  (2.3.10).  By  (2.3.10), 
?o  =  ^o(Po  -  ^0  y  ^  Im(^o)-  By  (2.3.8),  5o  =  ^«n^  ®  yo  where  y^  €  R"'  and  there  are  only 
n2  degrees  of  freedom  in  ^q.  This  suggests  that  the  system  (2.3.10),  which  is  of  order 
N  =  H;  n2,  can  be  reduced  to  a  system  of  order  nj.  To  achieve  this,  we  first  denote  the 
projection  from  Im(/?o)  onto  /?"'  by  £*.  By  (2.3.5), 

'E=\®V  (2.3.11) 

We  note  that  E'  ^q  =  y^  and  E  E'  ^q  =  E  y^  =  ?o-   Premultiplying  (2.3.10)  by  E',  we  get, 

E*(/  +  /?oAo*)££'?o  =  f*^oPo- 
This  is  an  nj  by  nj  system  of  the  form  B  y^  =  b,  where 

B  -  £•(/  +  /?o  Ao*)  E  =  I„^  +  '/?!  £•  Ao"  E,  (2.3.12) 

b'E'R,po  =  'R,  E'  p,  =  ('d„)'  ■  ^/?i  Si  Ij.  (2.3.13) 

Here  ^(f„^  is  given  by  (2.1.7).  By  (2.3.9)  and  (2.1.11),  the  original  null-vector  p  is  given  by 
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p  =  Po  +  ^o"  ^  yo  =  (5i  ®  5:)^  1  +  A^Ey.  (2.3.14) 

Since   fl    is    non-symmetric,    we   can    solve   B  yQ  =  b    by    using    the    normal    equations 
B'  B  yQ  =  B'  b,  or  by  the  Orthodir  method  discussed  in  9  2.2. 

For  Aq  to  be  a  good  preconditioner,  it  is  necessary  that  Aq"  x  can  be  easily  computed  for 
all  X  6  /m(Ao).  In  view  of  (2.3.12),  we  in  fact  only  require  that  E'  A^  E  y  can  be  computed 
economically  for  all  y  C  £*(/m(i4o)).  There  are  three  alternative  ways  of  computing  this 
quantity.  For  clarity,  we  introduce  only  one  alternative  here,  and  leave  the  other  two  to 
Appendix  A.l.  Let  us  remark  that  the  alternative  discussed  here  is  the  easiest  to  handle 
analytically,  but  that  it  is  not  the  best  choice  for  numerical  work. 

Alternative  (A):-  Complete  diagonalization  of  Aq  by  (5^  Qi  ®  52  Qi). 

Using  (2.1.16)  and  some  straightforward  computations,  we  have 

£*  A^  E  =  E'  {I®S^  Q:)  (^i  Qi  ®  /)  2*  {Q\  Sf^  ®  /)  (/  ®  Ql  Sf^)  E 
=  S2  Qi  E'  (5i  Ci  ®  /)  2^  (Gl  5r^  ®  /)  £  Q\  S2' 

=  S2Q2'i>Q'2S{K  (2.3.15) 

where  4>  is  diagonal  and  is  given  by 

n-i 

*  =  S  i\j)'  ■  27^  +  C'in,,)'  •  2;  (2.3.16) 

Here  ^q„  j  denotes  the  (n^J)  entry  of  the  orthogonal  matrix  Q^.  Ij  and  2*  are  given  by 
(2.1.14)  and  (2.1.15).  Putting  (2.3.15)  into  (2.3.12),  we  have 

fl  =  /  +  ^i?!  52  G2  *  Ql  S{^-  (2.3.17) 

Thus,  before  we  start  the  iteration,  we  have  to  generate  {G/,r,},^_j,  the  eigenpairs  of  the 

symmetric  matrices  Sj'^  G,  5,.  This  may  be  done  by  calling  a  standard  eigenvector  subroutine, 

see  the  EISPACK  manual  by  Smith  [38].  Since  the  G,  are  tridiagonal  and  the  5,  are  diagonal, 

this  requires  0(nl)  operations.  Since  2y~^  1  s  j  <  n^,  and  2*  are  diagonal,  <I>  can  be 

generated  in  O(n^)  operations  from  (2.3.16).   With  4>  and  C:  generated  and  stored  before  we 
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itcrate,  the  computation  of  (£*  Aq  E)  y  requires  only  2  n^  +  nj  operations  for  any  vector  y. 
The  storage  requirement  is  nl  +  nl  +  0(/i,),  since  we  need  to  store  the  C,'s.  By  (2.3.17),  as 
^J?i  is  a  bidiagonal  matrix,  the  matrix- vector  multiplication  By  thus  reqiiires  approximately ' 
In]  +  0(n2)  operations.  Notice  that  there  is  no  need  to  compute  the  last  entry  of  4>.  In  fact, 

LEMMA  2.3.2 

4>„  ,  the  last  diagonal  entry  of  <!>,  can  be  defined  arbitrarily. 

ElPiZL        Since  72^  =  0,  it  follows  from  (2.3.16),  (2.1.14)  and  (2.1.15)  that 

4>n,  =  2  i\j)Hyijr'  +  (\^y  7.  (2.3.18) 

where  y  is  arbitrary.  The  lemma  is  therefore  true  if  ^q„^  *  0.  Since  ^q.^  is  the  eigenvector 
corresponding  to  the  zero  eigenvalue  of  Si^GiS^,  thus  by  (2.1.10),   ^q.^  =  S^  1^.   In 

particular,  by  (2.1.7).  \,^  =  a/n  (— A_).  ^  0.     Q 

Combining  this  result  and  the  fact  that  /m(/?o)  »»  an  ("2"  l)-dimensional  vector  space,  we  can 
further  reduce  the  dimension  of  the  system  (2.3.10)  to  (/i^- 1).  We  will  exploit  this  fact  in  9 
3. 

When  n2  is  of  moderate  size,  (2.3.17)  suggests  we  can  compute  and  store  B,  and  then 
solve  B  y^  =  b  by  a  direct  method  sudi  as  Gaussian  elimination.  However,  the  numerical 
results  given  in  9  4  are  computed  by  using  iterative  methods  because  we  are  also  interested  in 
the  case  when  /ij  is  very  large.  Since  each  matrix-vector  multiplication  Bd  requires  about  2  nj 
operations,  thus  the  work  reqtiired  for  the  Orthodir  method  in  the  ;'-th  iteration  is  roughly 
3  j  rtj  +  2  nl,  see  9  2.2.  The  algorithm  converges  very  fast,  and  the  number  of  iterations 
required  for  a  given  accuracy  is  roughly  independent  of  n^.  Solving  the  normal  equations 
would  require  4n|  -I-  ©(/ij)  operations  per  iteration.  However,  the  number  of  iterations  is 
about  half  of  that  required  by  the  Orthodir  method. 


-27- 

After  obtaining  the  solution  Ui  B  y^  =  b  by  an  iterative  method,  we  can  generate  the 
original  null-vector  p  by  (2.3.14).  We  note  that  this  step  may  require  N  (n^  +  rij)  operations 
since  Aq  E  y^  is  not  sparse  and  N  storage  spaces  are  required  for  holding  p.  However,  in 
some  particular  but  interesting  cases,  we  can  rediice  both  the  operation  count  and  storage. 
For  example,  if  only  the  blocking  probability  />„  _i^  _;  is  required,  then  we  only  need  to 

calculate  an  expression  of  the  form  e^-AQ  E  y,  where  e^-  is  the  A^-th  unit  vector  in  R^ .  This 
requires  even  fewer  components  of  the  solution  than  E'  Aq  E  y  and  can  be  evaluated  in 
0(nl)  operations  and  requires  only  0(12)  storage.  As  another  example,  suppose  we  want  the 

probability  of  overflow  from  queue  1  to  queue  2,  which  is  given  by   2)  Pn  -ij>  O'  more 

y-o     ' 

generally,  suppose  only  a  linear  fimctional  of  p  is  required.  In  these  cases,  there  is  no  need 
to  generate  and. store  p  explicitly.  The  idea  is  to  generate  the  solution  one  block  at  a  time, 
and  then  accumulate  its  contribution  to  the  functional  before  we  generate  another  block. 
More  precisely,  suppose  we  want  to  calculate  /*  p ,  where  /  is  a  vector  in  R^ .  By  (2.3.14)  and 
the  fact  that  the  entries  of  5,  are  given  by  (2.1.7),  we  only  need  to  evaluate  an  expression  of 
the  form 

s  -  /•  (A^  E  yo)  =  r  (S,  Ci  ®  5:  G;)  2*  (Ql  S^'  ®  Ql  S^')  E  y^. 

Lcl  us  first  partition  /  into  n,^  block,  !^,  .  .  .  ,f„ ,  each  with  n,  entries,  and  define 


Wj   = 


\j  Od„)-'  ■  S2  Q2  2;-^  {Ql  S2'  yo),   i^j<n„ 
\^,  Cd„)-'  ■  S2  C:  2;  (G2  S{'  yo).  j  =  n,. 


(2.3.19) 


then  it  is  easy  to  check  that 


y-u-i 

Thus  it  is  dear  that  we  can  generate  wj  one  at  a  time  and  accumulate  its  contribution  to  /"  p 
before  we  generate  another  wj.  Hence  no  extra  storage  is  required  for  p.  Notice  also  that 
this  way  of  acomiulating  the  result  blockwise  does  not  increase  the  work.  Similar  techniques 
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are  also  discussed  in  Banegas  [3]. 

Let  us  consider  an  important  special  case,  the  single  server  case,  where  i,  =  1.  In  this 
case,  we  can  derive  explicit  formula  for  G„  T,  and  <I>.  We  will  show  that  Q,x  can  be 
computed  by  using  the  Fast  Fourier  Transform.  Hence  the  operations  count  and  storage 
required  for  each  iteration  can  be  further  reduced  by  almost  a  factor  of  n,. 

We  first  give  the  formula  for  Q,  and  T,,  /  =  1,  2.  By  (2.1.6),  with  *,  =  1,  we  have 


5,-^  G,  S,  =  Vx,  ill- 


-1  p,+  -^ 
Pi 


-1         0 


0  -1  p,+  -L  -1 

P( 

-1    i 

Pi 


(2.3.20) 


for  Z=l,2,  where  p,  =  (—)\  Let  6, ,  =  -i^,  and  define  4*,  /  by 


1 


imi^,j-9,j)  =  — sun|>,y, 


for  ls;<n,,  or  equivalently, 


(2.3.21) 


Pf  sin^Q;^ 


l-2p;COs9,^  +  pf 


It  is  easy  to  check  that 


%  = 


i-^)\im^,j  ,  sin(9,^  +  4»,^)  , sin((n,-l)9,^  +  .|»,^))*   lsy</i„ 


(^)'*(1  ,  P, P,"'"  V 

I-P?"' 

are  the  normalized  eigenvectors  of  5,"  ^  C,  Si  with  eigenvalues 


;  =  ";. 


(2.3.22) 


(2.3.23) 


lij  = 


Vx^(p,+  — -2cos9,,)  lsy<n„ 

P/ 

0  ;=«,. 


(2.3.24) 


-29- 

Thus  Q,  =  i'qxJqj '^n)  <^*°  ^*  generated  without  calling  any  EISPACK  subroutine. 

Next  we  claim  that,  for  any  real  vector  x,  Qi  x,  /=1,2  can  be  computed  by  using  the 
Fast  Fourier  Transform.  In  fact,  the  *-entry  of  this  vector  is  given  by 


=  (-)'*I,Mik-l)9,j+^,j)xj  +  ( ^)'*phn, 

"'  J'^  1-p, ' 


=  C-^)'*  IMAG 


t 

.,-1 


^.'^^-^'-v-C.'^) 


y-1 


l-P/ 


where  IMAG  means  taking  the  imaginary  part.   The  numbers  ^y  "  «  '■'  or^  ,  y=l,...,n,  can  be 

"' 
computed  by  using  n,  complex  multiplications.   The  expression  5)'      '  '"'^y  can  be  evaluated 

by  the  Fast  Fourier  Transform.  This  requires  only  0(n,logn,)  operations  for  arbitrary  n,,  sec 
Appendix  A.4.  We  remark  that  when  using  the  Fast  Fourier  Transform,  there  is  no  need  to 
store  Q,.  Thus  the  storage  requirement  is  reduced  from  0(nf)  to  0(n,).  From  (2.3.17),  we 
sec  that  the  work  and  storage  required  for  computing  the  matrix-vector  multiplication  Bd  are 
thus  reduced  to  0{n^oy\^  and  0{n-^  respectively.  The  work  of  generating  the  whole  null- 
vector  p  can  also  be  reduced  to  0{n\  lognj).  If  only  one  of  the  j,  =  1,  we  can  still  reduce  the 
work  and  storage  by  using  alternatives  B  or  C  given  in  Appendix  A.l. 

Finally  we  give  a  formula  for  <1>.  First  we  recall  from  lemma  2.3.2  that  <!>„  can  be  set 
arbitrarily.  For  l^_/</i2,  by  (2.3.16),  they-th  diagonal  entry  of  *  is  given  by 


i-i 
By  using  the  formulas  for  gj,  7,^  and  4»,^  in  (2.3.22),  (2.3.23),  and  (2.3.24),  we  get 

^   ^     2  "^Sin^((ni-l)e,^   +^,J    ^      \-p\     p?"^' 


30 


it,-i 


21,-2 


1  -  P?       Pi    ^ 


fi,-i 


1       2     ^"r^ 
1  -  Pi    Pi  ' 


ni  iifi  (l-2piCOsei^+Pt)(p,i(l-2piCOsei^+Pl)   +  Tf2^)        1  -  pj"'      '>'2J 


sin^ei^i _2_  ^iSy. 


n.-l 

"i    l2j  1^1  (l-2p,cos8i_i+p|)  "i    "y2^it'ijti(l-2piCOse,jt+p^i)    +  -ijj 


1      2     2"r2 
1  -  Pi    Pi 


Using  the  Poisson  summation  formula  (see  (2.2.17)  and  (2.2.20)),  the  first  term  is  equal  to 


,  which  combines  with  the  third  term  and  gives 


1  2     M-i 


"1-1 


A  i  •'       r-l    V» 


sin^e 


M_ 


72y         "1   Tf2j  i-ijiLi(l-2piCOse;_i  +  pt)   +7:^ 


(2.3.25) 


The  second  term  in  (2.3.25)  can  also  be  computed  by  using  the  Poisson  simimation  formula. 
To  apply  the  formula,  we  notice  that 


(l-2piC0sei_i+pi)   +  -7^  =  P1I 


^  +  ^  -  2  +      V<^      +  4sin= 


,  Oij 


=  — |l  -  lajCOi^Q^j,   +  aj\. 


where  a,  is  the  smallest  root  of 


Pi         VXiM-i 


Thus, 


n.-l 
"1     i-1 


sin*  9, 


lA. 


(l-2piCos9ijk+pi)  + 


n,-l 


sin^9, 


(2.3.26) 


rijL  "1    Pi  i-i    l-2^yC0S  9ijk +a/ 
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Pi 


1       ^T^ 


1- 


2", 


Put  this  back  into  (2.3.25)  and  we  have 


LEMMA  2.3.3 


In  the  single  server  case, 


«I>,  = 


• 

1 

1-^ 
Pi 

1       2"r^ 


arbitrary 


where  oj  are  given  by  (2.3.25).     □ 


\^j<n2. 


;  =  '«:. 


(2.3.27) 


Thus  4>  can  be  generated  in  O^rti)  operations. 

Before  we  turn  to  other  problems,  let  us  remark  that  the  matrix  A  has  an  analogy  in  the 
continuous  case.  Notice  that  ^/?|  in  (2.3.6)  is  the  forward  difference  operator  on  a  line.  Thus 
/?o  in  (2.3.5)  resembles  an  operator  which  is  zero  in  a  rectangxilar  region,  but  with  a 
tangential  derivative  along  one  of  the  side.  This  particular  side  corresponds  to  those  states 
where  the  first  queue  is  full.  Recall  that  Aq  resembles  a  finite  difference  approximation  of  an 
elliptic  operator  with  a  transport  term  on  the  same  region  and  with  Neumann  boundary 
conditions  on  every  sides.  Thus  the  continuous  analogy  of  A  =  Aq  +  Rq  is  the  finite 
difference  approximation  of  the  same  operator  as  Aq,  but  with  an  oblique  derivative  on  the 
particular  side.  Using  this  analogy  and  the  result  obtained  in  9  2.2.1  and  9  2.2.2,  an  intuitive 
explanation  for  the  fast  convergence  of  the  method  is  that  the  preconditioner  Aq  is  a  good 
approximation  to  the  operator  A ,  in  the  sense  that  it  changes  its  oblique  boundary  condition 
into  a  Neiunann  boundary  condition.  In  9  2.4,  we  will  design  precondidoners  that  change  the 
oblique  boimdary  condition  to  a  Dirichlet  or  a  mixed  type  boundary  condition.  Numerical 
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results  in  5  4  show  that  the  method  converges  faster  if  the  preconditioner  has  the  same  or 
similar  type  of  boundary  condition  as  the  original  operator. 

In  the  single  server  case,  the  under lyirg  elliptic  operator  has  constant  coefficients  and  is 
given  by 

+  2in,-  l)(ti,i  -\OPx  +  2(n2- l)(pL2  -  Xj)  p^  ~  0.  (2.3.28) 

Thus  for  n,  large,  it  is  sensible  to  consider  a  limit  such  that  p.,  -  X,  =  0((n,-l)°)  for  a 
certain  a.  In  5  3,  we  will  analyse  the  method  under  this  limit.  In  the  next  sub-section,  we  will 
consider  models  where  overflow  is  permitted  in  both  directions. 

i  2.3.2         Overflow  In  Both  Directions 

A  more  interesting  model  is  to  permit  overflow  in  both  directions.  More  precisely,  we 
assume  that  whenever  a  queue  is  full,  customers  entering  that  queue  will  be  served  by  the 
other  queue.  When  both  queues  are  full,  customers  entering  the  network  are  lost. 

Let  p  denote  the  steady-state  probability  distribution  for  this  model.  Then  the  balance 
equations,  in  matrix  term,  can  be  written  as, 

Ap  =  {Aq  +  R)p=0,  (2.3.29) 


with  constraints 


Here, 


l'p  =  h  (2.3.30) 

p,j  a  0.  (2.3.31) 

R  =  i\  'K)  ®  *^i  +  %  ®  C'n,  'S).  (2.3.32) 


where .'/?,  are  given  by  (2.3.6).  This  system  differs  from  the  one  we  have  in  9  2.3.1  only  in 
R.  The  second  term  in  R  corresponds  to  the  overflow  from  queue  2  to  queue  1.  Notice  that  R 
is  still  sparse  and  A  still  satisfies  the  assimiption  of  lenuna  2.1.1.    Thus  A  has  a  one 
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dimensional  null-space  with  positive  nuU-vector.  Hence  the  solution  p  exists  and  is  unique. 
We  note  that 

Im{R)  =  {  X  6  i?'*''  I  ;t  =  i«„  ®  V  +  V  ®  W  ,  'y  €  R"'  .  2('>)y  =  0  }  C  /m(^o)-        (2-3.33) 

'  u 

Moreover,  by  (2.1.25),  (2.1.21)  and  (2.3.30)  there  exists  a  unique  Jq  €  /m(Ao)  such  that 

P=Po  +  A^  fo-  (2.3.34) 

Substituting  this  into  (2.3.29)  and  using  (2.1.24),  we  get 

iI  +  RA^)lo=  -RPo-  (2.3.35) 

It  is   easy   to   dieclc   that  R  Pq  #  0.    Thus   the   homogeneous   system   (2.3.29)    is   again 

transformed    into    an   inhomogeneous   one.    By   lemma   2.3.1,    we   see   that   the   matrix 

(I  +  RAq)    is    nonsinguJar.    By    (2.3.35),    Iq  =  -R  p  ^  Im(R).    Thus   |j   has    at   most 

m  ■  n^  +  /i2  degrees  of  freedom.  Let  us  denote  by  £*  the  projection  from  Im{R)  onto  R", 

and  let  Jq  "  E'  fg-  We  have  £  £*  fg  =  £  Jg  =  fo-  Notice  that  for  any  5  ^  Im(R)  and  y  ^  R"*, 

the  computation  of  £*  ^  and  £  y  requires  no  arithmetic  operations.  In  fact,  it  can  be  done  by 

deleting  or  inserting  zeros  in  the  vector  ?  and  y  respectively.   Premultiplying  (2.3.35)  by  £*, 

we  obtain  an  m  by  m  system 

(I  +  E'RA^  E)yo=  -E'Rpo,  (2.3.36) 

which  can  be  solved  by  the  iterative  methods  previously  considered.   After  obtaining  yQ,  the 

solution  p  is  given  by 

P=Po  +  A^E  Jq. 
Notice  that  in  each  iteration,  we  have  to  compute  a  matrix-vector  product  of  the  form 
(I  +  E'  R  Aq  £)  Jo-  I^ost  of  the  work  will  be  in  computing  the  product  E'  R  Aq  E  y  for 
y  6  /?".  Notice  that  here  we  do  not  have  an  explicit  formula  for  £'  R  Aq  £  as  in  (2.3.17).  If 
we  use  Alternative  A  of  the  last  section,  without  taking  advantage  of  the  sparsity  in  R,  about 
2n^  operations  are  required.  In  the  single  server  case,  this  will  be  reduced  to  2n^  logn 
operations.  This  is  the  same  operation  count  as  for  solving  the  discrete  Laplacian  in  the 
rectangular  region  by  using  the  Fast  Fourier  Transformation.  In  Appendix  A.2,  we  have 
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designed  an  algorithm  that  requires  only  6n^  +  0(n)  operations  per  iteration  and  n^  +  0(n) 
memory.  If  one  of  the  s,  equals  one,  we  can  reduce  the  counts  to  5n^  +  0(n)  operations  and 
0(/i)  memory  spaces. 

We  remark  that  the  matrix  R  is  the  discrete  approximation  of  the  operator  which  is  zero 
in  the  rectang\ilar  region  and  has  tangential  derivatives  along  two  of  the  sides.  These  sides 
correspond  to  states  where  one  of  the  queue  is  full.  Thus  the  matrix  A  =  Aq  +  R  resembles 
the  finite  difference  approximation  to  the  same  continuous  operator  as  discussed  in  previous 
sections,  but  with  oblique  derivatives  on  two  of  the  sides  and  Neiunann  boundary  conditions 
on  the  remaining  ones.  Numerical  results  show  that  the  preconditioned  system  converges 
very  fast,  the  number  of  iterations  required  to  attain  a  given  accuracy  is  roughly  constant 
independent  of  n .  This  fast  convergence  can  again  be  explained  informally  by  the  results  in  § 
2.2.1. 

In  the  next  section,  we  will  design  other  separable  preconditioners  for  these  networks 
by  first  perturbing  the  balance  equations.  Using  the  analogy,  these  preconditioners 
correspond  to  the  finite  difference  approximation  of  the  same  operator  but  with  a  different 
type  of  boundary  conditions. 
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i  2.4        Other  Separable  Precondltlonen 

In  this  section,  we  will  design  other  preconditioners  for  the  models  we  discussed  in  § 
2.3.  We  consider  only  separable  preconditioners  here  because  such  systems  can  be  solved 
economically.  For  simplicity,  we  confine  ourselves  to  the  model  discussed  in  §  2.3.1.  The 
idea  can  easily  be  extended  to  more  general  networks. 

Let  us  assume  that  we  are  solving  (2.3.2)  -  (2.3.4).  Since  A  has  a  one  dimensional 
null-space,  we  can  fix  one  component  of  p ,  and  solve  the  resulting  non-singular  system. 
More  precisely,  by  lemma  2.1.1,  p  is  positive,  and  is  unique  up  to  a  multiple  constant,  thus 
we  can  always  setpv  ~  ^  ^^^  partition  the  system  (2.3.2)  as 


Ap  = 


=  0.  (2.4.1) 


B    d       p 

c'  -n    p.v 

Using  the  facts  that  A  is  irreducible  and  has  zero  column  sum,  we  see  that  B  is  trredudbly 
diagonally  dominant  and  hence  nonsingular.  Thus  we  can  proceed  to  solve  the  reduced 
system  B  p  =  -  d  by  direct  or  iterative  methods,  see  Kaufman  [25]  and  Funderlic  and 
Mankin  [17].  However,  it  is  impossible  to  design  a  separable  preconditioner  for  B  because  its 
dimension  is  n^/ii-l.  To  get  around  this,  we  can,  instead  of  considering  submatrices  of  A, 
consider  a  perturbed  version  of  (2.3.2).  More  precisely,  we  fix  pv  such  that  XiPv  =  1,  or 
equivalently,  let  \iCe„  ^**  ®  ^e„  ^e'„)  p  =  fv  We  then  obtain  p  from 

Ap  =  {A  +  \,{\  W,  ®  ''.,  K)  }  P  =  '.V-  (2.4.2) 

Notice  that  A  is  irreducibly  diagonally  dominant  and  therefore  non-singular.  Since  A  is  of 
order  n-ji^,  it  is  now  possible  to  design  separable  preconditioners  for  A. 

{I)  A  family  of  separable  preconditioners  for  A 
Let  us  partition  A  as  A  =  A^  -♦■  R^,  where 
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^1  =  ^1  •{'««/<  ®  tridiag  ( -1  ,  0  ,  0  )  }, 

V,=  G,+  \r\'\'  (2-4.3) 

Here  the  G,  are  given  by  (2.1.6).  Qearly  A^  is  separable  and  V^  is  irreducible  d';'.gonally 
dominant.  Hence  Ai  is  non-singtdar.  We  can  write  p  =  A'^^  ^^  and  solve  the  preconditioned 
system 

aa:'^,  =  (/ +  ^iA:M?i  =  '.%•• 

Because  of  the  sparsity  of  R^,  we  can  reduce  this  to  an  /ij  ^  "2  system  as  in  9  2.3.1. 
Unfortunately,  nimierical  results  show  that  the  convergence  rate  for  this  preconditioned 
system  is  very  slow.  Notice  that  ^4^  resembles  the  finite  difference  approximation  of  a  second 
order  elliptic  operator  on  the  square  with  a  Dirichlet  boundary  condition  on  one  of  the  side 
(see  (2.4.3))  and  Neumann  boundary  conditions  on  the  remaining  sides.  In  fact,  if 
X,  =  jjL,  =  5(  =  1  in  (2.4.3),  then  V^  =  tridiag  (  -1  ,  2  ,  -I)  -  el  e^.  This  is  exactly  the 
finite  difference  approximation  of  a  simple  second  order  ordinary  differential  operator  with  a 
Neumann  type  data  at  one  end  and  a  Dirichlet  type  data  at  the  other.  Thus  an  intuitive 
explanation  for  the  slow  convergence  is  that  A^  is  not  a  good  approximation  to  A.  It  changes 
the  oblique  derivative  in  A  into  a  Dirichlet  boundary  condition. 

Notice  that  A^  is  not  the  only  non-sing\ilar  separable  preconditioner  for  X,  in  fact,  there 
exists  a  family  of  non-singular  separable  preconditioners.  Let  us  defme,  for  any  ^, 

Vp  =  Ci+pXi-(\i^;),  (2.4.4) 

A3  =  V3®/„^  +  /„^®G:,  (2.4.5) 

Rf,  =  \r  {\  'e:^  ®  tridiag  (-1,1-3,0)}. 

We  note  that  A  =  A^  +  R^.  Qearly  A^  is  separable.  When  3  >  0,  Vg  is  irredudbly 
diagonally  dominant  and  hence  A^  is  nonsingular.  We  can  then  defme  p  =  A^^  ^^,  and  solve 
for  Sp-  These  preconditioners  correspond  to  operators  with  a  mixed  type  of  boundary 
conditions  on  the  side  in  question.  Our  nimierical  results  show  that  the  performance  improves 
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when  ^  gets  closer  to  zero.  This  can  again  be  explained  informally  by  the  analogy  mentioned 
in  9  2.2.1  and  §  2.2.2. 

Let  us  consider  the  case  when  3  =  0.  We  obtain  Vq  =  G^  and  Aq  =  Aq.  This  is  the 
preconditioner  considered  previously.  It  is  singular  and  corresponds  to  the  operator  with 
Neumann  boundary  conditions  on  every  sides.  Hence  we  cannot  set  p  =  Aq^  ^q  and  solve 
for  ?0'  However,  we  can  still  design  a  fingular  separable  preconditioner  for  the  non-singiilar 
matrix  A,  see  (II)  below.  For  p  <  0,  the  preconditioners  again  correspond  to  operators  with 
mixed  type  boundary  conditions.  Numerical  results  show  that  the  convergence  rate  is  slower 
when  P  becomes  more  negative.  We  remark  that  V^  in  (2.4.4)  can  be  symmetrized  by  a 
diagonal  matrix,  but  it  is  no  longer  definite.  More  precisely,  by  the  Cauchy  interlace  theorem 
(see  Parlett  [34]),  and  (2.4.4),  Vp,  and  hence  A^,  has  one  negative  eigenvalue. 

(//)  Separable  Preconditioner  for  A  when  3  =  0 


By  (2.1.25),  there  exists  unique  scalar  a  and  ^q  ^  ^'"('^o)  such  that  p  =  a  Pq  +  Aq  ^q. 
Since  we  have  set  X^pv  =  1,  a  is  no  longer  arbitrary.  Thus  besides  solving  for  ^g.  ^c  need 
an  extra  equation  for  a.  Since  A  =  Aq  +  Rq,  we  have 

A(apo  +  A^  iQ)  =  il  +  RqA^  )^o  +  aRoPQ  =  fv 

Since  ^g  €  ^'»(^o)>  ^^  have  1*  5o  -  0-  This  is  the  extra  equation  we  need.  We  can  now  solve 
the  following  (N+1)  by  (A^+1)  system 


Ff 


(7  +  RqAq  )  RqPo 
1*  0 


(2.4.6) 


We  claim  that  F  is  non-stnguJar.  To  prove  this,  suppose  that  (  ^  ,  3  )*  is  in  the  kernel  of 
F.  This  implies  that  A  (^  p^j  +  Aq  ^)  =  0  and  1*  5  =  0.  Since  A  is  non-singular,  the  first 
condition  implies  that  3  Po  '*'  '^o^  ^  ~  ^'  ^^  second  condition  implies  that  §  6  /m(i4Q),  which 
by  the  definition  of  Aq  ,  implies  that  -4^  5  €  /m(Ao).  Thus  by  (2.1.20),  we  have  3  =  0  and 
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Aq  i  =  0.  By  rht  invertability  of  Aq  on  /m(Ao),  we  have  5  =  0.  Hence  F  is  non-singular  and 
it  is  legitimate  to  solve  f or  /  in  (2.4.6).  By  the  sparsity  of  Rq,  we  can  reduce  (2.4.6)  to  an 
(n2+l)  by  (nj+l)  system. 

Notice  that  by  (2.4.2),  A  differs  from  A  by  a  rank  one  matrix.  Thus  Rq  differs  from  the 
J?o  in  (2.3.5)  by  a  rank  one  matrix.  Hence  the  A^  by  A^  leading  sub-matrix  of  F  F'  differs 
from  the  preconditioned  matrix  (I+RqAq)  (I+RqAq)'  considered  in  9  2.3.1  by  at  most  a 
rank  three  matrix.  Using  the  Cauchy  interlace  theorem,  the  singular  values  of  (I+RqAq) 
will  interlace  the  singular  values  of  F,  except  possibly  a  few  outlying  ones.  In  particular,  if 
the  singxilar  values  of  (I  +  RqAq  )  aic  clustered,  so  will  be  the  singular  values  of  F.  The 
numerical  results  in  5  4  show  that  the  convergence  rate  for  these  two  systems  are  very  much 
the  same. 
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i  2.5        General  Overflow  Qoeoeing  Models 

The  idea  of  precx)nditioning  the  singular  generating  matrix  A  by  another  singiilar  matrix 
Aq  can  easily  be  extended  to  more  general  overflow  queueing  networks  where  A  and  Aq  have 
the  same  order.  Let  us  first  illustrate  how  to  apply  our  method  to  the  3-queue  model 
discussed  in  Kaufman  [25].  We  then  generalize  our  method  to  a  family  of  overflow  queueing 
models  with  ^  >  3. 

i  2.5.1        A  Three-qoeae  Overflow  Model 

Gansider  the  following  3-queue  network.  Customers  entering  or  being  overflowed  into 
the  i-th  queue,  i  =  1,  2,  will  be  overflowed  and  served  at  the  (j-i-l)-th  queue  if  all  the  spaces 
in  the  i-th  queue  are  occupied.  Customers  entering  the  third  queue  will  be  lost  if  the  third 
queue  is  filled.  Moreover,  once  a  customer  entered  a  queue,  he  will  stay  in  that  queue  until 
he  is  served.  In  particular,  customers  cannot  jump  between  the  waiting  lines.  The 
Kolmogorov  equations  of  this  network  are  given  by 

+  M"!  niin(/,Ji)  +  \L2  Tnm(J ,S2)  +  jtj  niin(*,j3)  \pijj, 

=  Xi(l-8;.o)P!-v^  +  C^-^jfijO^'^t^^-i+h)  Pij-ij,  (2-5-1) 

+  M,2minO  +  l,J2)(l-8y^j_i)p,^+i^  +  jij  min(fc  +  l,j3)(l-8i^^_J  ;7,_/_i+i, 

where  0^i<n^,  Osj<n2  and  0^k<ny  Thus  there  are  ^  =  n,  nj  n^  states  in  the  system.  The 
corresponding  matrix  equation  is  given  by 

A  p  =  (  Ao  +  /?o  )  P  =  0.  (2.5.2) 
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HereAo  is  given  by  (2.1.17)  and 

Ro  =  \  W,  ®  ^i?,  ®  /  +  /  ®  \  'e:^  ®  'R2  +  \  W,  ®  \  K,  ®  '«!,    (2.5.3) 
where  .'J?,  are  given  by  (2.3.6).  Since  p  is  a  probability  distributior,  we  further  require  that 

rp  =  l,  (2.5.4) 

Pijj,  a:  0.  (2.5.5) 

The  terms  in  (2.5.3)  indicate  the  direction  of  the  overflow.  The  first  term  corresponds  to  the 

overflow  of  customers  from  queue  1  to  queue  2,  the  second  term  corresponds  to  the  overflow 
from  queue  2  to  queue  3  and  the  third  term  corresponds  to  the  case  where  queue  1  and  queue 
2  are  filled  and  aistomers  from  queue  1  are  overflowed  to  queue  3.  We  note  that  R^  is 
sparse.  It  has  only  m  ^  {n^-^  n^  -  I)  n-^  rows  with  non-zero  entries,  and  that  every  such 
row  has  at  most  three  non-zero  entries. 

As  in  9  2.3,  we  can  transform  (2.5.2)  into  an  inhomogeneous  system 

(/-^/?oAo*)?o=  -^o/'o.  (2-5-6) 

where  ^q  ^  '"'(■^o)  ^"'^ 

P  =  Po  +  ^^o"  Co-  (2-5.7) 

By  using  (2.5.3),  it  is  easily  checked  that  RqPq  =it  0.  By  the  form  of  •'/?,  in  (2.3.6),  it  is  also 

clear  that  A  =  A^  -^  Rq  satisfies  the  assumptions  of  lemma  2.1.1,  hence  the  solution  to 

(2.5.2),  (2.5.4)  and  (2.5.5)  exists  and  is  unique.  Thus  by  lemma  2.3.1,  we  sec  that  the  matrix 

in  (2.5.6)  is  non-singular.  Since  Rq  is  sparse  and  has  m  non-zero  rows,  we  can  reduce  this 

system  to  an  m  by  m  system.  Notice  that  we  need  0{m})  =  0(nf)  storage  spaces  to  hold  the 

entries  of  this  matrix.  For  a  network  with  n,  =  20,  the  storage  requirement  is  about  640,000. 

Using  iterative  methods  discussed  in  9  2.2  will  require  much  less  storage.  The  cost  per 
iteration  depends  mostly  on  the  cost  of  the  matrix-vector  multiplication.  If  we  use  (2.1.22)  to 
compute  Aq  5  fo'  a°y  vector  5,  it  is  clear  that  this  computation  require 
2  "1 ":  "3  ("1  +  "2  +  nj)  +  Oinf)  operations.  By  using  the  sparsity  of  Rq,  Rq  5  can  be 
computed  in  3  m  =  0{nf)  operations.  Thus  assuming  that  n^  =  n2  =  ^3  "  n,  the  cost  per 
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iteration  is  6n*  +  0(n^)  operations.  The  storage  requirement  is  n^  +  0(n^).  In  the  single- 
server  case,  we  can  reduce  the  operation  counts  of  our  method  to  O(n'logn).  Let  us  point 
out  that  by  using  the  sparsity  of  Rq,  we  can  actually  reduce  the  counts  to  0(n?-)  storage  spaces 
and  O(n^)  operations  per  iteration.  This  algorithm  is  given  in  Appendix  A.3. 

We  remark  that  by  using  (2.5.3)  and  the  form  of  ^R,  in  (2.3.6),  the  operator  Rq 
resembles  an  operator  which  is  zero  in  a  cube  and  has  tangential  derivatives  on  two  of  the 
faces.  These  particular  faces  correspond  to  those  states  where  either  queue  1  or  queue  2  is 
fuU.  Recall  that  Aq  resembles  the  standard  seven  point  operator  that  obtained  when 
discretizing  an  elliptic  operator  in  a  cube  with  Neumann  type  boundary  conditions 
everywhere.  Thus  the  matrix  A  =  Aq  +  Rq  resembles  Aq  but  with  oblique  derivatives  on  the 
two  particular  faces.  Hence  under  this  analogy,  Aq  and  A  has  the  similar  boundary  conditions. 
From  our  previous  experience,  we  expect  that  Aq  to  be  a  good  preconditioner  in  this  case  and 
this  fact  is  confirmed  by  the  numerical  results  in  5  4. 

Next  we  extend  our  method  to  a  more  general  type  of  overflow  models  which  include 
this  model  and  the  models  we  discussed  in  §  2.3  as  particular  cases. 

S  2.5.2         A  ^-Qoeae  Overflow  Model 

Let  us  consider  the  following  ^ -queue  networks.  Customers  entering  into  a  full  queue 
will  be  overflowed  to  another  queue  or  be  blocked  and  lost.  To  avoid  ambiguity,  we  assume 
that  for  any  given  queue  there  is  at  most  one  direction  of  overflow  of  customers.  However, 
we  permit  successive  overflowing,  i.e.,  customers  being  overflowed  from  the  j-th  queue  to 
the  7-th  queue  say,  can  be  overflowed  to  another  queue  if  the  ;-qucue  is  also  filled.  To 
prevent  customers  from  wandering  within  the  network,  we  assume  that  any  given  path  of 
successive  overflowing  does  not  form  a  loop.  We  assume  moreover  that,  once  a  customer  has 
entered  a  queue,  he  will  wait  in  that  queue  until  he  is  served.  In  particular,  customers  cannot 
jump  between  the  waiting  lines  and  the  overflow  of  customers  from  any  queue  can  occur  only 


.42- 

when  the  queue  is  full.  These  conditions  will  ensure  that  all  the  AT  states  in  the  network  are 
admissible.  Hence  the  generating  matrix  A  will  be  of  the  same  order  as  the  preconditioner 
i4o.  As  in  5  2.5.1,  we  have 

A  =  Ao  +  /?o.  (2-5.8) 

where  /?o  consists  of  terms  like  those  in  (2.5.3).  To  be  more  specific,  let  us  use  the  notations 
Ef  -  'e„  'e'„  and  £?  -  'Rj  -  /  where  the  matrices  '/?y  are  given  by  (2.3.6).  If  customers 
entering  the  z-queue  will  be  overflowed  and  served  at  the  j-queue  when  the  /-queue  is  full 
then  /?o  contains  the  following  term 

/?»=    (^£f-V^  (2.5.9) 

This  is  because  overflow  from  the  j-th  queue  to  the  y'-th  queue  corresponds  to  adding 

^k,n-iiX-^k,n,-dhPk, *,  (2.5.10) 

to  one  side  of  the  Kolmogorov  balance  equations  and 

{'^-K)\.r^hPk, k,-i *,  (2.5.11) 

to  the  other  side,  for  1  s  jt,  <  n^,  1  s  j  s  q.  The  term  in  (2.5.10)  indicates  that  we  are 

leaving  the  state  {k^ k^)  at  an  additional  rate  X,  when  the  i-th  queue  is  full  and  the  j- 

th  queue  is  not  yet  full.   The  term  in  (2.5.11)  indicates  that  we  are  entering  the  state 

{k^,  .  .  .  ,  k^)  at  an  additional  rate  X,  from  the  state  (it^ *)_! ifc^)  when  the  «-th 

queue  is  full.  Successive  overflow  from  one  queue  to  another  means  that  the  term  added  to 

/?o  has  more  £,  factors  in  it.  For  example,  if  overflow  from  the  i-th  queue  to  the  ;-th  queue 

can  be  overflowed  to  the  Jk-th  queue  when  the  j-th  queue  is  filled,  then  we  add  to  Rq  the  term 

R,j,  =  ^^E^"  Ej" 'R^.  (2.5.12) 

This  follows  from  the  fact  that  we  have  to  add 

8/,  n,-l  \  nj-i  (1-  8/^  „,- l)X,  p,^ ,^ 

and 
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for  1  ^  /<  <  n,,  1  s  ys  9,  to  the  two  sides  of  the  Kolmogorov  balance  equations.  Notice  that 
every  terms  added  to  Rq  has  one  and  only  one  JR,  factor  in  it.  Here  the  i  and  j  indicate  the 
origin  and  the  final  destination  of  the  overflowed  customers  respectively. 

Notice  that  by  (2.3.6)  and  the  definition  of  £,,  all  the  terms  added  to  Rq  have  zero 
column  sums,  nonnegative  diagonal  and  nonpositive  off-diagonal  entries.  Thus  A  =  Aq  +  Rq 
still  satisfies  the  assumptions  of  lemma  2.1.1.  Hence  the  steady-state  probability  distribution 
exists  and  is  imique.  Thus  by  lemma  2.3.1,  the  system  (I  +  RqAq  )  is  non-singiilar.  Next  we 
notice  that  those  terms  added  to  Rq  are  sparse.  In  fact,  since  overflow  occurs  only  when  at 
least  one  of  the  queues,  say  the  t'-th  queue,  is  full,  the  corresponding  term  has  at  most  N/rii 
non-zero  rows.  Moreover,  since  every  such  term  represents  the  overflow  from  one  queue  to 
another,  every  non-zero  rows  has  at  most  two  non-zero  entries.  If  we  permit  aU  possible 
directions  of  overflow  within  the  network,  the  total  number  of  non-zero  rows  in  Rq  will  be 
boimded  above  by 


m^-N-tj--  (2.5.13) 

Moreover,  the  total  number  of  non-zero  entries  in  every  such  rows  will  not  exceed  q  +  I, 
since  there  are  only  q  queues  in  the  network.  Thus  Rq  is  also  sparse. 

Using  the  sparsity  of  Rq,  we  can  reduce  the  system  (I  +  RqAq  )  to  a  system  which  has 
order  at  most  m^.  However,  for  general  network,  this  will  often  be  too  large  to  handle  by 
direct  methods.  For  these  methods  would  require  0{m^)  =  0(n^'~^)  storage  spaces.  Notice 
that  for  any  given  vector  5.  the  vector-matrix  multiplication  Rq  5  requires  at  most  (q+1)  m. 
operations.  This  work  is  negligible  when  compared  to  the  work  of  computing  i4o'  4  ^y 
(2.1.22),  which  is  of  ©(n'"^^).  Thus  using  iterative  methods  mentioned  in  §  2.2,  the  work  and 
storage  required  for  each  iteration  are  roughly  0(n«*^)  and  0(ni)  respectively. 

Notice  that,  using  the  continuous  analogy,  each  term  added  to  Rq  corresponds  to  a 
forward  difference  operator  on  a  particular  face  in  the  ^-cube.    These  faces  correspond  to 
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states  where  overflow  occurs.  Thus  Rq  corresponds  to  an  operator  which  is  zero  in  the  N' 
cube,  but  with  tangential  derivatives  on  some  of  the  faces.  RecaU  that  Aq  corresponds  to  an 
elliptic  operator  acting  on  this  A^-cube  with  Neumann  boundary  conditions  everywhere.  Thus 
the  matrix  A  =  Aq  +  Rq  resembles  the  same  operator  but  with  oblique  derivatives  on  those 
particular  faces.  Hence  the  boundary  conditions  of  A  and  Aq  are  of  the  same  type.  From  the 
experience  in  previous  sections,  we  expect  our  method  to  have  fast  convergence  for  these 
kind  of  models. 

In  the  next  section,  we  consider  models  in  which  the  dimension  of  the  state  space,  i.e., 
the  dimension  of  the  generating  matrix  A ,  is  less  than  the  dimension  of  Aq. 


.45- 
i  2.6        An  OTcrflow  Model  with  Restricted  State-space 

In  this  section,  we  consider  a  model  in  which  overflow  occurs  even  before  a  queue  is 
full.  The  resulting  problem  is  still  a  homogeneous  system  of  the  form  A  p  =  0,  but  with 
some  of  the  entries  of  p  set  to  zero.  Thus  the  dimension  N^  of  A  is  less  than  the  dimension  N 
of  the  preconditioner  Aq  discussed  in  9  2.1.  We  introduce  two  methods  here  to  solve  this 
queueing  problem.  In  the  Hrst,  we  partition  the  state  space  into  subspaces  in  which  we  can 
find  separable  preconditioners.  In  the  second,  we  embed  the  state  space  into  one  wh?re  we 
can  use  i4o  as  a  preconditioner. 

Let  us  consider  the  foUowing  2-queue  network.  Customers  entering  the  first  queue  will 
wait  and  be  served  at  the  second  queue  if  all  the  spaces  in  the  first  queue  is  filled.  Moreover, 
we  assume  that  a  customer  waiting  for  service  in  the  first  queue  is  moved  to  and  served  at  the 
second  queue  if  a  server  in  the  second  queue  becomes  available.  Thus,  some  of  the  states  are 
not  permissible  here.  More  precisely,  we  have 


Pij  =  0  Si<  i  <  n^,  0  ^  j  <  52- 

We  may  associate  the  values  of />,y  with  the  following  L-shaped  region: 


(2.6.1) 


/3  Flgxire  1 


n,-l 
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In  the  figure,  /i,  /j,  /j  are  line  segments  defined  by 

/i-{ii}x  [0,*2-l],  (2.6.2) 

/j  -  [  *i  +  1  ,  ni  -  1  ]  X  {  ,j  },  (2.6.3) 

/3  -  {  «!  -  1  }  X  [  ,2  ,  «2  -  2  ].  (2.6.4) 
For  simplicity,  we  let 

T2  -  'i  U  '2  U  '3  U  { ('!.•»:)  .  ("i-l,":-!) }.  (2.6.5) 

We  remark  that  this  model  is  similar  to  the  one  discussed  in  Kaufman,  Serry  and 

Morrison  [24],  except  that  we  have  added  a  feature.  Namely,  we  permit  overflow  from  the 

first  queue  to  the  second  queue  when  the  first  queue  is  full.    The  Kolmogorov  balance 

equations  of  our  model  are  given  by 

[Xi  (l-8,„  _i8y„^_iXy-,p  +  ^2(l-8/n,-i)  +  niin(i,jjni  +  minO,J2)M^  ]P;y 

=  (l-X/-i-,^Xi,-i-y)  [Xi(l-8/o)Pf-iy  +  (l-8y«,-i)  niinO  +  l,*2)  »^Pij+J 

+  (l-8yo)  [>^ii\Xs,-j  +  Sta.-iXy-,;  +  X2(l-x,-i-,^X,,-y)]  P,j-i  (2-6.6) 

+  (l-8/«,-i)  [(l-X/-,^Xi,-i-y)  niin(«  +  l,5i)ji,i  +  S2i^2Xi-,^ij,)Pi*ij, 
for  0  IS  i  <  n^  and  0  ^  j  <  nj.  Here 

h-   /  a  0. 

(2.6.7) 


_  /I,   l^  0, 
'^'  ~  |o,  Z  <  0. 


We  note  that  these  equations  imply  (2.6.1).  From  (2.6.6),  we  see  that  the  steady-state 
probability  distribution  p  satisfies  the  homogeneous  equation 

Ap  =  0.  (2.6.8) 

Here  A  is  the  generating  matrix  of  dimension  S^^  and  satisfies  the  assumptions  of  lemma 

2.1.1.  Since  p  is  a  probability  distribution,  we  supplement  (2.6.8)  with 

!>  =  !,  (2.6.9) 

p,j  a  0.  (2.6.10) 

By  lemma  2.1.1,  the  solution  p  to  (2.6.8)  -  (2.6.10)  exists  and  is  unique.   Moreover, 
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P,j  >  0.  (2.6.11) 

Considering  the  continuous  analogy,  we  find  that  the  matrix  A  resembles  an  elliptic 
operator  acting  on  the  L-shaped  region  with  Neumann  boundary  conditions  on  t^  and  oblique 
boundary  conditions  on  tj.  The  idea  of  the  previous  sections  would  suggest  the  partition 

A=A-¥R,  (2.6.12) 

with  A  resembling  the  same  operator  but  with  Neiunann  boundary  conditions  everywhere.  R 

will  then  be  an  operator  that  is  zero  in  the  L-shape  region,  but  has  tangential  derivatives 

along  Tj.   We  note  that  A  has  the  form 

T,  D,    0 
A=    E^  C^  D2  .  (2.6.13) 

0    E,    T, 

Here  7,  represents  couplings  between  the  pairs  of  states  in  n,,  C^  couplings  between  the  pairs 
of  states  on  the  interface  t,  and  D,  and  Ej  couplings  between  the  pairs  belonging  to  A,  and  t. 
The  dimension  of  C.  is  equal  to  the  number  of  states  on  t,  which  is  equal  to  12— Sj'  ^^'  ^ 
small  when  compared  to  the  dimension  of  T,,  which  is  equal  to  the  total  number  of  states  in 
CI,.  The  dimension  of  T^  is  s^n2  and  that  of  T2  is  (n2-«2)(/»i-5j^-l).  We  note  that  in  (2.6.13) 

^2  =  [->^iV,,.0r.  (2.6.14) 

D2  =  [-s,^>.rJn,-s,,0].  (2.6.15) 

where  0  is  the  zero  matrix  of  order  (n2-J2)  by  (ni—s^-l).  Thus  they  are  sparse. 

The  exact  form  of  A  can  be  obtained  by  using  the  equations  given  in  9  2.6.1.  More 
precisely,  by  (2.6.13) 


A  = 


where  the  submatrices  £2.  ^2'  ^i~^i'  ^0 


>i  D,    0 

E,  C2  D2 

+ 

0     0    72 

0  Ci-C2  0 


(2.6.16) 


E,  C2 


0      £2       0 

and  T2  are  given  by  (2.6.14),  (2.6.15), 


(2.6.26),  (2.6.29)  and  (2.6.30)  respectively.    It  is  then  easy  to  check  that  1*  A  =  0.  Since 
R  =  A  —  A  and  1*  A  =  0,  we  have 
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r^=l*A  =  0.  (2.6.17) 

We  claim  that  the  matrix  R  is  sparse.  We  first  note  that,  depending  on  the  ordering  of  the 

states,  any  index  j,  1  ^  j  ^  A/'^,  corresponds  to  a  unique  state  (j^  ^ji)  in  the  L-shaped 

domain.  Using  this  notation,  it  is  straightforward  to  check  that 


/?  =  /?,  +  /?, 


(2.6.18) 


Here  /?|  is  a  diagonal  matrix  given  by 


(^i)y;  = 


0  otherwise. 


(2.6.19) 


and  /?2  is  given  by 


-\,         (  *i ,  t2  -  1 )  and  (71 ,  ;2 )  ?  '1  U  h^ 
(  *2  )kj  =  j  -*2  >^2     (  *i  +  1  ,  *2  )  and  ( A  ,h)i  I2,  (2.6.20) 

0  otherwise. 

Thus  the  number  of  non-zero  rows  is  equal  to  the  number  of  states  on  T2,  which  is  equal  to 

^/T  ■  "i  +  "2  -  •»!  -  1-  (2.6.21) 

Moreover,  every  such  rows  has  at  most  two  non-zero  entries.  It  can  be  shown  that  A  still 

satisfies  the  assumptions  of  lemma  2.1.1.  Hence  it  is  singular  with  a  one  dimensional  null- 
space.  However,  A  is  not  separable.  Hence  A'^  x  cannot  be  computed  economically.  In  the 
following,  we  will  design  singular,  separable  preconditioners  that  are  close  to  A  in  the  sense 
that  they  represent  the  same  operator  in  the  L-shaped  region  and  have  the  same  type  of 
boundary  conditions.  The  first  idea  comes  from  the  theory  of  substructuring. 

92.6.1         Partitioning  of  the  State-space 


The  method  of  substructuring  have  been  used  for  solving  elliptic  problems  defined  in 
irregxilar  regions,  see  Bj0rstad  and  Widlund  [7],  Dryja  [14],  Bramble  et  al.  [8],  Buzbee  et  al. 
[10]  and  Concus  et  al.  [11].  The  idea  is  to  partition  the  problem  into  subproblems  which 
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correspond  to  subregions  into  which  the  original  region  has  been  partitioned.  We  can  then 
solve  each  of  these  subproblems  separately  by  direct  methods  while  the  Literactions  between 
the  subregions  are  solved  by  a  direct  or  iterative  method.  Since  the  number  of  nodes  on  the 
interface  usually  is  small  compared  to  the  number  of  nodes  in  each  subregions,  the  size  of 
this  interface  problem  is  usually  small  compared  to  the  original  problem.  If  we  are  using 
iterative  methods  to  solve  the  interface  problem,  then  in  each  iteration,  we  will  have  to  solve 
the  subproblems  once  in  each  subregion.  However,  if  the  boundary  conditions  for  the 
original  region  are  such  that  separation  of  variables  is  possible,  then  solving  the  subproblems 
by  direct  methods  will  reqoire  very  little  amount  of  work. 

To  be  more  specific,  let  us  consider  the  problem  of  solving  Laplace's  equation  in  the  L- 
shaped  region  depicted  in  Figure  1,  with  Neumann  boundary  conditions  everywhere.  The 
theory  of  substructuring  suggests  the  following  preconditioner.  We  first  solve  the  problem 
defined  on  H^  with  Neumann  boimdary  conditions  on  the  boundary  of  Cl^  including  t.  This  is 
a  separable  problem.  Having  solved  this  problem,  we  use  the  value  of  the  solution  on  the 
interface  t  as  Dirichlet  data  and  solve  a  Dirichlet-Neumann  problem  on  Qj  with  Dirichlet 
boundary  condition  on  the  interface  t  and  Neumann  boundary  conditions  on  the  remaining 
three  sides.  This  problem  is  also  separable. 

Using  the  analogy  between  the  queueing  model  and  this  continuous  problem,  we 
construct  our  preconditioner  accordingly.  The  numerical  results  in  §  4  show  that  this 
preconditioner  is  very  good.  In  matrix  terms,  we  partition  our  matrix  A  as 


A  =  A  +  R, 


(2.6.22) 


where 


A  - 


Aq    0 

T,  D,    0 

£2  T2 

= 

E,  C,    0 

0    £2   T, 

(2.6.23) 


Here 


^2  -  [  0  .  ^2  ]. 


(2.6.24) 


and 
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^0- 


£,  C, 


widi 


By  (2.6.12)  and  (2.6.16), 


C:  =  C,  -  Xi  •  I„^-,^ 


(2.6.25) 


(2.6.26) 


0        0         0 

R  =  R  +    0  C^-C^  D2 
.0       0        0 

Since  R,  D^  and  C'j  -  C2  are  sparse,  R  is  also  sparse.  In  fact  it  has  at  most 


(2.6.27) 


w  -  n^  +  2  rtj  -  Ji  -  52  -  2  (2.6.28) 

non-zero  rows,  and  every  such  rows  has  at  most  two  non-zero  entries.   Notice  that  the  sub- 
matrix  Aq  corresponds  to  a  Neumann  problem  on  the  subregion  fl^.  In  fact, 

A'o  =  C:  0  /„,  +  /,^^i  ®  Gj,  (2.6.29) 

where  G^,  of  dimension  J;+1,  is  the  same  as  G,  in  (2.1.6)  but  with  j;  replacing  n,-l  there. 

Thus  i4o  is  the  generating  matrix  of  the  free  model  discussed  in  9  2.1  with  s^  spaces  in  the 

first  queue  and  n2-l  spaces  in  the  second  queue.  Hence  by  lemma  2.1.2,  Aq  is  separable,  has 

a  one  dimensional  null-space  and  a  positive  null-vector.  Let  us  denote  its  null-vector  by  pq. 

On  the  other  hand,  Ti  corresponds  to  a  mixed  type  problem  defmed  on  Cii-  In  fact. 


Here 


72  =  V^®/„^_,^-^/„_,_i®V2. 


Vi  -  tridiag  (  Xi  ,  Xi  +  *i  p.1  ,  Ji  jti )  -  Xi  •  *,^-,^-i  e^.,^. 


(2.6.30) 


(2.6.31) 


is  a  matrix  of  order  n^—Si—l  and 


V2  -  tridiag  (  X2  ,  Xj  +  52  »J^2  .  *2  M-z  )  -  '2  V-i  •  «i  «I  -  Xj  •  e„^.,^  e''„^.,^    (2.6.32) 
is  a  matrix  of  order  n2~'*2-    ^*  '*  dear  that  Ti  is  separable  and  since  V^  is  irreducibly 

diagonally  dominant,  T2  is  non-singular.    Thus  by  (2.6.23),  A  is  singular,  and  has  a  one 

dimensional  null-space.  The  null-vector  of  A  is  given  by 


P  = 


Pa 
-T^^E^Po 


(2.6.33) 
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We  note  that  the  p,,  are  not  necessarily  positive.  Let  us  define  the  generalized  inverse  A"^  of 
A  as 


A*  = 


At  0 


(2.6.34) 


Similarly  to  lemma  2.1.2,  we  have 


LEMMA  2.6.1 


(i)  Im{A)  =  {x  = 


€  /? "  I  Xi  6  /m(Ao)  }. 


(ii)  A"^  is  invertible  on  Im(A).    More  precisely,  for  all  y  C  Im(A),  there  exists  a  unique 
X  (.  Im(A),  such  that 


A    X  =  y, 


where  x  =  A  y.  Thus  for  all  y  €  Im(A), 


A  A    y  =  A    A  y  =  y. 

(iii)  For  all  y  (.  R'  *,  there  exists  a  imique  a  and  5  €  /m(A)  such  that, 

y  =  ap  +  A^  i. 


(2.6.35) 


(2.6.36) 


(2.6.37) 


(iv)  Let  p  be  the  solution  to  (2.6.8)  -  (2.6.10),  then  there  exists  a  unique  a  ^  0  and 
i  i  ImiA)  such  that, 


p  =  op  +  A*  5. 


(2.6.38) 


Proof:         The  proof  of  (i)  -  (iii)  follows  easily  from  lemma  2.1.2  and  the  fact  that  T2  is 
non-singular.  Thus  let  us  prove  (iv).   By  (iii),  it  suffices  to  show  that  a  #  0.  Suppose  a  =  0, 


then  p  = 
t2.6.11).    a 


Po 
Pi 


=  A"*"  4  € /m(A).     By    (i),    Pq  i  Im(AQ),    hence    1*Pq  =  0,    contradicting 


We  remark  that  even  though  A  cannot  be  symmetrized,  we  still  have  a  decomposition  of  the 
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state  space  as  in  (2.6.37).  Notice  that  in  (2.6.23),  A  is  in  block  lower  triangxilar  form.  Since 
Aq  and  Tj  are  separable,  and  by  (2.6.14)  and  (2.6.24),  Ej  is  sparse,  thus  A*  x  can  be 
computed  easUy  for  any  x  i  Im(A).  We  remark  that  £2  P'cks  up  the  Dirichlet  data  on  t. 

Using  (iv)  in  lemma  2.6.1  and  since  p  is  imique  up  to  a  multiple  constant,  we  may  let 

p=p+A^  5o.  (2.6.39) 

and  normalize  it  by  (2.6.9)  after  we  find  it.  Putting  this  expression  into  (2.6.8),  we  get 

(/  +  /?A*)?o=  -^P-  (2.6.40) 

It  can  easily  be  checked  that  R  p  *  0.  Notice  that  in  the  splitting  (2.6.22),  we  do  not  have 

1'  A  =  0.    Thus  Im(A),  and  hence  Im(R),  are  not  necessarily  in  Im(A).    Hence  we  cannot 

draw  the  same  conclusion  as  in  lemma  2.3.1  that  the  matrix  { I  +  R  A"^  )  is  nonsingular. 

However,  it  is  clear  that  this  matrix  maps  Im(R)  into  itself.    By  (2.6.40),  it  is  also  dear  that 

5o  C  Im(R).  By  the  existence  and  uniqueness  of  ?o,  the  matrix  is  invertible  in  Im(R).   Thus 

we  may  proceed  to  solve  for  5o  by  considering  (I  +  R  A^  )\rm<iiy   ^  mentioned  in  9  2.2,  if 

we  use  the  Orthodir  method  to  solve  this  kind  of  equations,  then  there  is  no  need  to  do  the 

restriction  explicitly.  We  remark  that  since  R  has  m  non-zero  rows,  where  m  is  given  by 

(2.6.28),  (2.6.40)  is  practically  an  m  by  m  system. 

Let  us  calculate  the  cost  of  computing  R  A*  x  for  any  x.  Since  R  has  at  most  2m  non- 
zero entries,  R  x  can  be  computed  in  2m  operations.  Next  let  us  consider  solving 


A 

>o' 

^0 

\ 

yi 

^1 

where  Xq  C  /m(y4o).  Notice  that  by  (2.6.23),  A^y^  =  Xq-  Id  ^cw  of  (2.6.29),  this  system  can 
be  solved  by  the  Alternative  A  of  §  2.3.1  in  2/if  operations.  Let  us  remark  that  when 
i|  «  n2,  Alternative  B  ir.  Appendix  A.1  requires  only  4n2Ji  operations.  Having  found  y^, 
we  solve  Tiyi  =  x^  —  E2  yo-  Since  £2  '*  sparse,  the  right  hand  side  of  this  equation  can  be 
computed  in  n2—S2  operations.  Notice  that  in  (2.6.30),  Tj  is  separable.  Thus  y^  can  be  solved 
by  first  diagonalizing  V2  in  (2.6.32)  and  then  solving  the  resulting  tridiagonal  systems;  see 
Alternative  B  in  Appendix  A.l  for  more  detail.  We  note,  that  we  can  use  the  Fast  Fourier 
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transform  to  perform  the  diagonalization.  This  follows  from  the  fact  that  V2  is  the  same  as 
the  G,  in  (2.3.20),  but  with  the  service  rate  *2  jtj  rather  than  ji,.  Thus  the  work  is  roughly 

5  ('«2-*2)('»i~»i)  +  2  (n^-Sj)  log  (n:-*:). 
The  first  term  here  is  the  work  required  to  solve  the  resulting  tridiagonal  systems.  Combining 

these  results,  we  see  that  the  work  required  in  the  j-\h  iteration  is 

4  nj^i  +  5  (n2-*2)(''i~'i)  "•■  2  (n2~*2)  'o?  ('»2~'2)  "*"  ^  ym,  (2.6.41) 

and  the  memory  requirement  is 

2jm  r  si  +  0(n,-s,).  (2.6.42) 

We  note  that  there  are  many  other  viable  separable  preconditioners.  For  example, 
instead  of  solving  the  Dirichlet-Neumann  problem  corresponding  to  T;,  we  may  solve  a 
Dirichlet  problem  on  ^2-  ^^*  '*  *^so  a  separable  problem.  However,  from  the  discussion  in  5 
2.2.1  and  in  Bj^rstad  and  Widlund  [7],  we  expect  that  this  preconditioner  will  not  lead  to  an 
optimal  method. 

i  2.6.2         Embedding  of  the  State-space 

Capacitance  matrix  methods  have  been  developed  for  solving  elliptic  problems  in 
irregular  regions  such  as  the  L-shaped  region  in  Figure  1,  see  O'Leary  and  Widlund  [32], 
Proskurowski  and  Widlimd  [35]  and  Astrakhantsev  [1].  The  idea  is  to  embed  the  state-space 
into  a  larger  space  where  there  is  a  separable  preconditioner.  Here  we  design  an  algorithm 
that  adopt  this  approach. 

Recall  that  in  5  2.1,  i4o  resembles  a  Neumann  problem  on  the  whole  rectangular  region 
[0,  nj^— 1]  X  [0,  nj— 1].  If  we  order  the  states  in  the  L-shapcd  region  first,  then  we  can  write 


^0  = 


■^21   ^22 


(2.6.43) 

where  A^  and  A22  are  square  matrices  of  dimension  N^  and  N  -  N^  respectively.  We  claim 
that  A22  is  nonsingular.   In  fact,  by  the  definition  of  Aq, 
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All  =  Vi  ®  I,^  +  h^.,.,  ®  G:,  (2.6.44) 

where  V|  is  given  by  (2.6.31)  and  C^,  of  dimension  Si,  is  the  principal  submatriz  of  G-^ 

obtained  by  deleting  the  last  nj-s^  cxjlumns  and  rows  of  Gi-  Since  V^  and  Gi  are  irredudbly 

d'Pgonally  dominant,  Aji  is  nonsingiilar. 

Consider  the  A^  by  A^  matrix 


Av- 


A  A 
0  A 


(2.6.45) 


where  A  is  the  generating  matrix  in  (2.6.8).  Qearly  if  p  is  the  solution  to  (2.6.8)  •  (2.6.10), 


then  py  = 


is  the  unique  solution  to 
Ay  py  =  0, 


and 


iPs)kj^^,    l^kjsN. 


(2.6.46) 
(2.6.47) 

(2.6.48) 


Here  ly  =  (  1,  1,...,1 )  €  R^'.   Since  ly  Aq  =  0  and  1*  A  =  0,  it  follows  that 


lvAy  =  0.  (2.6.49) 

Moreover,  by  (2.6.47)  and  (2.1.25),  there  exists  a  unique  ?o  ^  ^"'(Aq),  such  that 


P.v  -  Po  +  ^o"  ?o» 


where po  >*  given  by  (2.1.11).  Define 

Ay  "  Ay  ~  Ag  = 

Then  (2.6.46)  and  (2.6.50)  imply  that 


A-Aii  0 


(2.6.50) 


(2.6.51) 


(/  +  /?yAoM?o=  -^.vPo-  (2.6.52) 

By  (2.6.49)  and  lemma  2.3.1,  the  matrix  (/  +  Rs ^o  )  "  nonsingular.  Thus  we  can  solve 

(2.6.52)  either  by  direct  or  iterative  methods. 

We  claim  that  the  matrix  R^-  is  sparse.  In  fact,  by  the  definition  of  Aq 


_(~^y         (*i-l.  *2)and(;i,;2)  €  /i, 
(  ^21  )kj-  y    0  otherwise. 


(2.6.53) 
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On  the  other  hand,  it  is  straightforward  to  check  that 

An  =  A  +  0,  (2.6.54) 

where  A  is  given  by  (2.6.16)  and  D  is  a  diagonal  matrix  such  that 


^JJ'{ 


^1         (JiJi)^^' 

0  otherwise.  (2.6.55) 


By  (2.6.12)  and  (2.6.54),  A  -  A^^  =  R  -  D.  From  (2.6.18)  -  (2.6.20),  we  see  that  J?  -  Z? 
stiU  has  N/f  nonzero  rows  and  every  such  rows  has  at  most  two  nonzero  entries.  Here  Nj^  is 
given  by  (2.6.21).  Moreover,  by  (2.6.51)  and  (2.6.53),  the  number  of  nonzero  rows  in  ^v  •* 
equal  to 

N^  ^  N/f  +  S2  =  rij^  +  n2  +  S2  -  Sj^  -  1,  (2.6.56) 

and  every  such  rows  has  at  most  two  nonzero  elements.  Hence  /?v  •*  sparse. 

Using  the  sparsity  of  /?v.  we  can  reduce  the  dimension  of  the  problem  (2.6.52)  to  N,f.  If 
we  use  the  iterative  methods  discussed  in  9  2.2  to  solve  (2.6.52),  then  in  each  step,  we  have 
to  compute  vector  of  the  form  /?y  Aq  5  where  5  €  Im(/?v)-  From  (2.1.16),  we  see  that  though 
5  is  sparse,  the  computation  of  /?v  ^o"  5  still  requires  0(nf)  work  and  0(nj)  storage  spaces. 
These  counts  are  considerably  higher  than  the  counts  given  in  (2.6.41)  and  (2.6.42). 

This  algorithm  has  not  yet  been  tested.  However,  follows  from  the  fast  convergence  of 
the  capacitance  method  for  elliptic  problems  and  the  continuous  analogy  of  the  queueing 
models,  we  conjecture  that  this  algorithm  also  has  a  fast  convergence  rate. 

This  concludes  our  discussion  on  different  algorithms  for  different  overflow  models.  In 
the  next  section,  we  will  analyse  theoretically  the  convergence  of  our  method  for  the  model 
discussed  in  9  2.3.1. 
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Section  3.  Analysis  of  a  Model  Problem 


In  this  section,  we  will  establish  the  fast  convergence  for  the  2-queue,  single  server, 
one-direction  overflow  network  discussed  In  9  2.3.1.  More  precisely,  we  will  obtain  an  upper 
bound  for  the  number  of  iterations  reqxiired  to  attain  a  given  accuracy.  The  main  step  is  to 
show  that  the  iteration  matrix  has  the  form  (n-I  +  L  +  U,  where  co  is  a  scalar,  L  is  a  matrix 
of  low  rank  and  (/  is  a  matrix  of  small  norm.  A  good  upper  bound  on  the  number  of 
iterations  can  then  be  obtained  by  choosing  a  suitable  polynomial  in  (2.2.3).  We  note  that  the 
convergence  theory  of  the  ordinary  conjugate  gradient  method  (2.2.1)  is  much  more  complete 
than  that  of  the  Orthodir  method  (2.2.5),  see  for  examples,  Axelsson  [2]  and  Van  der  Vorst 
[39].  Thus  in  the  following,  we  will  asstmie  that  the  matrix  equations  are  solved  by  applying 
(2.2.1)  to  the  corresponding  normal  equations.  We  will  first  consider  the  case  where  the 
traffic  density  X,  <  ji,.  We  will  then  discuss  the  cases  where  X,  a  ji,.  We  note  that  the 
problem  where  X,  =  p.,  is  the  one  we  discussed  in  §  2.2.2,  namely,  the  problem  of 
preconditioning  the  oblique  BVP  by  the  Neumann  BVP.  We  remark  that  this  problem  is 
much  more  difficult  than  the  problem  discussed  in  9  2.2.1  because  the  operator  is  not  self- 
adjoint. 

9  3.1         Defining  the  Parameten 

Recall  that  when  we  expand  p,^  in  (2.3.1)  with 

St  =  1,  (3.1.1) 

and 
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h,  =  (n,  -  1)-S 


1  =  1,2, 


(3.1.2) 


then  the  underlying  continuous  equation  is  of  the  form  (2.3.28).  Thus  to  obtain  a  sensible 
limit  when  the  queue  sizes  n,,  <=1,2,  are  large,  we  assume  that  the  traffic  density  and  the 
queue  size  satisfy  the  relation 


-^=1-3,  A,-,  ,-=1,2, 

where  (i;,  0(,  i=l,2  and  a  are  constants  independent  of  A,.  We  r?mark  that 


(3.1.3) 


LEMMA  3.1.1 

For  arbitrary  {  X,  ,  p.,  ,  A,  }}.^,  such  that  0  <  X,  <  jt,  and  0  <  A,  s  1,  j  =  1,2,  then 


O  "  fnin 


logfl-^) 


log  A, 


>o, 


(3.1.4) 


and  (3.1.3)  holds  with  0  <  3,  s  1,  for  i=l,2.    a 

The  proof  is  straightforward.  In  view  of  this  lemma,  we  assume  in  the  following  that 

a  2  0  and  0  <  3,  s  1,      i  =  l,2.  (3.1.5) 

Moreover,  in  order  to  avoid  taking  two  limits  simultaneously,  we  assume  that  when  A,, 

j  =  l,2,  tend  to  zero,  the  compatibility  condition 


-i-  =  r 

holds,  where  Cq  is  a  constant  independent  of  A,.  We  note  that  by  (3.1.3)  and  (3.1.6) 

or  eqmvalently, 

■r  =  —  =t  o(Ar). 

We  assume  in  the  following  that  there  exists  a  C  >  0,  independent  of  h„  such  that 


(3.1.6) 


(3.1.7) 


1        ^i 


(3.1.8) 
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We  remark  that  we  can  normalize  the  problem  by  setting 


»ti=l. 


(3.1.9) 


Since  X^  <  jti.  we  have 


Xi  <  1.  (3.1.10) 

In  the  following,  we  use  C  to  denote  any  generic  positive  constant  that  depends  only  on  a,  \l, 

and  P(,  1  =  1,2,  and  is  independent  of  A,. 

To  begin  the  analysis,  we  first  note  that  by  (3.1.1),  (2.1.7)  becomes 


where 


and 


S,  =  a,  •  diag  (  1  ,  p, p"'  * ).  «  =  1,2, 


1-p? 


U-p/   J 


Notice  that  for  a  a  0  and  h,  sufficiently  small, 


(3.1.11) 


(3.1.12) 


(3.1.13) 


p^-^>  =  (i-3,Ar)''  =  * 


f       i;"'«(^-e-*:)^^-e^.-i_o(,,^-., 


Thiu  we  have, 


LEMMA  3.1.2 


For  all  a  2  0,  there  exists  H,  =  W,(o,P,),  i  =  1,2,  such  that  for  all  A,  <  H,,  we  have 


and 


where 


^2(n-i)  ^  ^-p^--»  ^  ^^  ^  g^^  if  0  s  a  <  1, 


p,^(Vi)^,-:W>,:>0.  If  a^l. 


(3.1.14) 


(3.1.15) 


c,  =  c,(a,p,)  -  r"-^"'.   a 


(3.1.16) 
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We  remark  that  when  o  ^  1  and  h,  <  Hf,  by  (3.1.15),  we  also  have 

p("ri)  ^  ^-ez-i"-^  a  1  -  3,  hr^.  (3.1.17) 

This  follows  from  the  fact  that «"'  a  1  -  x  when  0  s  x  s  1. 

From  this  lemma,  we  see  that  there  are  three  different  cases  to  be  considered,  namely, 
the  cases  where  a  <  1,  a  =  1  and  a  >  1. 
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S  3.2        The  Case  when  a  <  1 

For  o  <  i,  the  problem  of  solving  p  in  A  p  =  0  approaches  the  separable  problem 
Aq  Pq  =  0.  More  precisely,  we  have 

THEOREM  3.2.1 

If  0  s  a  <  1  and  0  <  A,  <  A^„  for  i=l,2,  then 


\Apo\\2rS  $e 


-i^xr' 


Proof:        By  (2.3.2),  (2.1.2),  (2.3.5),  (2.1.11)  and  (3.1.11),  we  have 
A  Po  ~  i^o  +  ^o)  Po  ~  *o  Po 

Thus  by  (2.3.6),  (3.1.10),  (3.1.11),  (3.1.13)  and  (3.1.14), 

\\Apo\\2=iad'pT~'U\'R:Slh\\, 


=  ^ipl  ' 


<  2 


f  1-pi  1 

i-el 

[i-pJ-'J 

An  -6 


T* 


1^P2' 
(1  +  P5) 


1-Pl  '        1-P2  ' 


1  2(Ti) 
Pi    ' 


KSe-^^'^".    D 


(3.2.1) 


This  lemma  shows  that  when  a  <  1  and  n^  suffldently  large,  p^  is  already  a  good 
approximation  to  p.  In  fact,  when  a  <  1,  (po)ij  is  exponentially  small  for  «  close  to  n^— 1. 
Hence  the  direction  of  the  derivative  along  the  boundary  /  =  n^—l,  oblique  or  Neumann, 
does  not  have  much  effect  on  the  solution.  As  an  example,  we  note  that  when  a  =  0  and 
p?  =  3,  =  V4,  1  =  1,2,  then  ||  A  Pq  Hj  s  10"^°  when  n,  a  32. 
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9  3.3         An  Equivalent  Problem 

In  the  next  few  sections,  we  will  analyse  the  spectrum  of  the  iteration  matrix  B*  B  when 
o  a  1.  Here  B  is  given  by  (2.3.17).  However,  to  simplify,  we  will  first  carry  out  a  similarity 
transformation  of  B  and  work  with  the  transformed  matrix.  This  is  the  same  as  if  working 
with  B  but  using  a  different  vector  norm.  This  norm  is  equivalent  to  the  I2  norm  when  a  a  1. 

Define 

B,  -  Ql  S{'  BS2Q2  =  I  +  Ql  S{'  ^R,  5:  (2:  <I»-  (3.3.1) 

We  note  that  the  transformation  matrix  is  bounded  in  the  /j  norm.  More  precisely, 

LEMMA  3.3.1 

For  a  a  1  and  Aj  <  H2, 

\\'2{'S2Q2\\2  =  ai'\\S2\\2=  1.  (3.3.2) 

and 

II  «2  Q'l  S{'  lb  =  a,  II  5f  ^  II,  =  p;'"^"''  ^  -^.  (3.3.3) 

Here  a,  and  c,  are  given  by  (3.1.13)  and  (3.1.16)  respectively,     o 

The  proof  follows  immediately  from  (3.1.11)  and  (3.1.15).  We  remark  that  by  using  (3.1.17), 
we  have  a  much  better  bound  for  (3.3.3),  namely, 

a2\\S{'\\2^(l-^,hr'r'.  (3.3.4) 

Using  this  lemma,  we  claim  that  the  spectnim  of  B'  B  and  B'  B,  are  equivalent.  More 

precisely,  we  have 

LEMMA  3.3.2 

Let  the  singiilar  values  of  B  and  fl,  be  0  s  ct^  s    •  •  •   ^  a„  and  Q  ■&  6^ 
respectively.  If  a  a:  1  and  A,  <  H,  then 


"2 


C2  a/  s  d,  s  —  (Ti,  1  s  y  s  n,.  (3.3.5) 

Cj 
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Proof:        Given  an  arbitrary  subspace  R^  of  dimension  j  in  R"*,  it  is  easy  to  sec  that,  for  all 
«  e  Ql  S2^  •  R', 

uB'B,u    ^  u   Ql  5;  B'  5;-^  BS2Q2U 
u'  u  u'  u 

_  z' 4  52^2   y'B'By  x'a^'^Slx 

~  .  •  .  » 

z   z  y  y  XX 

where  x  =  Q2U,  y  =  Sjx  and  r  =  5  y.  By  the  Courant-Fischer  theorem,  sec  Parlett  [34], 
and  lemma  3.3.1, 

.2^  »b:b.u 

a,  s       max        ; 

z'  a\  $2^  z             y'  B'  By              ''  a:"^  ^2  -^ 
■s  max = — •  max  ^ ^  •  max ; 

1  y' B' B  y 

s  -^  •  max  J   "^"  J    . 

c\     y^Ri     y  y 
Hencu  using  the  Courant-Fischer  theorem  again,  we  have 

The    other    inequality    in    (3.3.5)    can   be    established    similarly    by    using   the    mnximin 
characterization  of  the  j-th  singular  value,     o 

Thus  the  condition  number  of  B'  B  can  be  expressed  in  term  of  the  condition  r.'jmber  of 
fl*  Bj.  We  remark  that  by  using  (3.3.4)  instead  of  (3.3.3),  we  have 

(l-^ikr^)ajSajSil-  ^,hr^y^<Tj,  isysn^.  (3.3.6) 

Hence  for  A;  sufficiently  smaU, 

\aj-aj\<2&,hr\  isysnj.  (3.3.7) 

The  tools  required  to  derive  the  bound  for  6^  and  a^  are  developed  in  the  next  section. 
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i  3.4        Matrix  Identities  and  the  Norm  of  <P 

In  this  section,  we  will  prove  three  lemmas.  The  first  summarizes  all  the  matrix 
identities  we  need  to  derive  the  bounds  on  a^  and  d„  .  The  next  two  give  the  bounds  for  the 

extreme  eigenvalues  of  ^. 

Let  us  begin  by  defining 


*,- 


P2 
-1    P2 

0 


0 


-1     P2 

-1  0 


(3.4.1) 


and 


G. 


P2  -1 

-1    P2+j-    -1 
P2 


0 


0 


-1  p,.x  _1 


-1 


_1_ 

P2-I 


z  - 


^  1 

P2 

-10  1        0 

0     -1   0 

1 

-1 

-P2 

(3.4.2) 


(3.4.3) 


r,  -  (  ^2  M-2  )"'*     ^2  -  diag  (  7i f     ) 


where 


(3.4.4) 


^        (X2  .^2)'' 


'P2  + 2cos9,    1  s  ;■  <  nn, 

P2 


J  =  "2. 


(3.4.5) 
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with 


^  "2 


1  S  ;•  <  nj. 


(3.4.6) 


LEMMA  3.4.1 

The  following  matrix  identities  hold  for  all  P2  >  0. 

(1)  S2'G2S2  =  i\2i>'2)'*G„ 

(2)  Ql  G,  Q2  =  r„ 

(3)  S2'^R,S2  =  ^R„ 

(4)  R,  R;  =  pj  G„ 

(5)  Z,G,  +  G,z:  =  0, 

(6)        Ql  z,  Ql  r,  +  r,  Q\  z;  q^  =  o, 

(7)  2/?,  =  G, +  (p2--f)/    +Z,. 

pj       J 


(3.4.7) 
(3.4.8) 

(3.4.9) 

(3.4.10) 
(3.4.11) 
(3.4.12) 

(3.4.13) 


Proof         (1)  and  (2)  are  just  restatements  of  (2.3.20)  and  (2.1.12)  respectively.  (3)  -  (5)  can 
be  obtained  by  straightforward  computations.  (6)  follows  from  (2)  and  (5).  (7)  can  be 


obtained  by  writing  R,  =  —^—- — -  +  — ^ 


We  remark  that  the  variables  X,  and  p.,  have  dimension  (1  /  time).  In  contrast,  the 
variables  pj  and  "/y,  1  s  _;'  <  nj,  are  all  dimensionless.  Thus  the  matrices  R^,  G,,  Z,  and  Y , 
are  dimensionless.  Accordingly,  we  define  the  dimensionless 


4>.-(X2H2)''cD, 
where  *  is  given  by  (2.3.27).  We  note  that  by  (3.3.1),  (3.4.9)  and  (3.4.14), 

i'.  =  /+^G2^,G2*.- 


(3.4.14) 


(3.4,15) 


Moreover,  by  (2.3.27),  (3.4.5)  and  (3.4.14),  the  diagonal  entries  <}>y  of  4>,  are  given  by 


<l>y=(X2>t2)'**;=r- 


1       ^T^ 
1  -  h.  ^-"J 


p^  1-^;"' 


1  S  y  <  nj, 


(3.4.16) 


where  Oj  are  given  by  (2.3.26).   In  view  of  (3.4.5),  Oj  is  the  smallest  root  of 
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aj  -  (Pi  +  —  +  {  -yy)  fly  +  1  =  0, 
Pi 


where 


1  ^j  <  "2. 


(3.4.17) 


{- 


^2  tJ-Z 


>0. 


We  note  that  by  (3.1.8),  there  exists  a  C  >  0  such  that, 

c-^<  {  <  C. 
Since  the  constant  term  in  (3.4.17)  is  1,  thus 


(3.4.18) 


(3.4.19) 


where 


Since 


o»;  -  (u^j  -  4)^  _ 


"^  2  (0^  +  (0)2-4)^' 


1  ^  y  <  "2, 


«/  ■  Pi  +  —  +  { 7/.       1  ^  ;■  <  '»2- 
Pi 


X+   -22 

X 


for  all  X  2  0, 


yj>Oby  (3.4.5).  Hence  uy  >  2  by  (3.4.21)  and  ay  <  1  by  (3.4.20). 


LEMMA  3.4.2 


^j  is  a  decreasing  function  of  y  for  1  s  y  <  nj. 


(3.4.20) 


(3.4.21) 


(3.4.22) 


Proof:         By  (3.4.16), 


^      Pi  I    T'y    J      Pi  I  ^y 


2n  -1         l  +  fli 

T.      ^  •  ' 


1  ^"l 

1-ay   'j 


1  s  ;•  <  nj.        (3.4.23) 


We  claim  that  the  terms  in  the  right  hand  side  are  decreasing  functions  of  y.   By  (3.4.5),  we 
see  that  -yj  is  a  increasing  function  of  y  for  1  s  y  <  n2.  Thus  by  (3.4.21),  wy  increases  w.r.t. 


j.  Hence  by  (3.4.20),  Oj  decreases  w.r.t.  j  and  so  is  ay  ' 


2n.-l        1  +  '^Z      ^T  t-  I.    . 

^.  Next  we  observe  that 


1-a;^ 


by  (3.4.17), 


(ay-pi)(ay-— )  =  iyjOj. 
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Thus  the  first  term  in  (3.4.23)  cm  be  rewritten  as 

^  PiZ^  =  ^^1__  (3.4.24) 

Since  a,,  p^  <  1,  the  right  hand  side  is  a  decreasing  function  of  j.  Finally,  since 

llfi  =  iZ£i  +  £1:^,  (3.4.25) 

Try  7y  Tfy 

1     fl-fl/l 
is  a  sum  of  two  decreasing  function  of  /,  —    ^    is  also  a  decreasing  fuction  of  /.    d 

Pi   I   T'y    J 

LEMMA  3.4.3 

For  a  a  1  and  h,  <  H;,  i  =  1,  2,  we  have, 

(1)  ^j^  C  >  0  ,  for    1  s  J  <  n2,  (3.4.26) 

(2)  4>^  s  C  •  -^,  for    1  s  ;•  <  /tj.  (3.4.27) 

Proof:         To  prove  (1),  we  first  note  that  by  (3.4.5), 

7,  =  P2  +  —  -  2  cos  e,  <  P2  +  —  +  2,  1  s  ;■  <  nj.  (3.4.28) 

P2  P: 

Hence  by  (3.4.21) 

(lij  <  1],  l-sj<n2,  (3.4.29) 


where 


^  -  Pi  +  —  +  U  P:  +  —  +  2  ).  (3.4.30) 

Pi  P: 


By  (3.4.29),  oij  +  (oij  -  4)^  <  2-ri.  Thus  by  (3.4.20), 

^    0),  +  (o^j  -  4)^         1 
■n  >  — ^^ =  — ,  l^j<n2. 

Hence, 


2  a, 


Since  Oy,  p^  <  1,  the  last  term  is  positive,  hence 


(3.4.31) 


11  -  Pi  >  —  -  Pi  =  ^^.         1  =s  7  <  n2.  (3.4.32) 


■^ —  >  ^—  =  [—  +  U  P:  +  —  +  2)]-i  >  0.  (3.4.33) 


l~Piay         "n-pi  Pi  P2 


67 


Notice  that  the  second  term  in  (3.4.23)  is  nonnegative.  Thus  by  (3.4.24), 

'        Pitj         1    Pi  fly  Pi  P: 

By  (3.1.12),  p,  =  (  1  -  p,  h?  )*  >  (1  -  P,  H,»  )'*  >  0,  for  i  =  1,2,  thus 

p,-^  <  C,  i  =  1,2. 

Using  this  and  (3.4.19),  (3.4.34)  gives  <^y  a  C  >  0  f or  1  ^  ;  <  /i:- 

Next  we  prove  (2).  By  (3.4.16), 


(3.4.34) 


(3.4.35) 


*,=  ^ 

^       ^J 


l_.fL.iZfZ_ 
Pi        i_.^"i 


1-a 


1  s  7  <  /l2. 


We  note  that 


1— fli    '  ''i~l  1 

^ ^  .  This  IS  because  a,  <  1  and  /(f)  =    — (1  —  a')  is  a 


1       ^"i 
1-aj 


decreasing  function  of  t  when  0  <  a  <  1  and  t  >  0.  Thus 

if  Oi    n,  — 1  ^        p,  —  a,  a, 

<t>y  ^  -^    1  -  -^  -^—    =  -i^i ^  +  ^— ,  1  s  ;•  <  „2.       (3.4.36) 

^       yj   [        Pi     "i    J        "yyPi         Try  Pill 

Notice  that  in  (3.4.24),  the  right  hand  side  is  positive.  Thus  p^  a:  Oj  for   1  s  y  <  nj.  Hence 
by  (3.4.24)  again. 


Pi  -  fly 


.<£^^^ 


1 


Cfl 


^ 


1 


Since  p^  <  1, 


TTyPi  TTy"!         1-Pifly         If;  "i 


IrSjKn. 


^<4^--l 


''y  ^  1 
Notice  that  by  (3.4.17)  and  (3.4.22),  we  have 


l^j<n^ 


(3.4.37) 


(3.4.38) 


(fly-l)2  =  (p^  +  J-  _  2  +  {  T,y)  fly  2  {  -Ty  fly.  1  ^  7  < 


Since  ay  <  1,  this  implies 


1-a, 


(-^)^ 


1  S  ;■  <  nj 


(3.4.39) 


Hence  (3.4.38)  becomes 
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<|».  s  (i^)'*  +  -1-,  1  ^  y  <  nj.  (3.4.40) 


Notice  that  by  (3.4.22)  and  the  inequality  sin-;^  ^  -^,  we  have 

-y   =  pj  +  i  -  2  +  4sin2-:f  a  4sin2^  a  ■^,  1  ^  7  <  n^.        (3.4.41) 

P:  2  2         nj 

Thus  by  (3.1.6)  and  (3.4.19),  (3.4.40)  becomes 

4.yS-|-^+-T^s|(V(+-i)-i.sC-f.    ls;<„2.    a       (3.4.42) 

Recall  that  by  lemmas  2.3.2,  ({>„   can  be  defined  arbitarily.  Thus  for  simplicity,  we 
assume  in  the  following  that 

«*>„,  =  0.  (3.4.43) 

We  remark  that  this  corresponds  to  a  very  particular  choice  of  7  in  (2.1.15).  In  fact,  by 

(2.3.18),  we  see  that  (3.4.43)  holds  when 

"1-1 
7  =  -  i}ln,^)-'  2  {'in  J'  {l.j)-'-  (3.4.44) 

Using  (3.4.43),  lemmas  3.4.2  and  3.4.3,  we  obtain 

II  4),  II2  :s  C  nj.  (3.4.45) 

We  are  now  able  to  derive  an  upper  bound  on  the  condition  number  of  B,  and  B. 
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i  3.5         Condition  Nomber  ot  B  when  a  2  1 

An  upper  bound  for  the  largest  singular  value  of  B,  can  be  obtained  easily  for  o  a  1. 

LEMMA  3.5.1 

For  o  a  1  and  A,  <  ff„  i  =  1,  2, 

\\B.\\2^C-n,.  (3.5.1) 

Proof        By  (3.4.15)  and  (3.4.45) 

By  (3.4.10),  (3.4.8)  and  (3.4.28), 

II  R,  \\l  =  II  Rs  R:  II:  =  P2  II  G,  II2  =  P2  II  r,  II2  5  (P2  +  1)^  ^  4. 

Thus  II  flJIj  s  1  +  C  — ^  n2.  By  (3.1.8),  the  lemma  follows,     o 

^2 

To  derive  a  lower  bound  for  the  smallest  singular  value  of  B'  B^,  we  need 

LEMMA  3.5.2 

Let  B^  =  B,W  where  W  is  nonsingular.  If  X^miB^  +  fl*)  a  8  >  0,  then 

li^rMli^fllWll:.  (3.5.2) 

Proof        For  arbitrary  x,  using  the  Cauchy-Schwartz  inequality 

8  II*  Hi  ^  X^(5.  +  O  ||x  111  s  x'  (B„  +  B:)x  =  2x'B^x^  2  ||x  Hj  ||B„x  Hj. 
Define  y  =  B^x,  we  have, 

I|g;^y|l2  ^  2 

IIHI2  8- 
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2 
Since  y  is  arbitrary,  hence  ||  B~^  Wi  ^  -r    and 

II  B;'  lb  =  II  W  W-'B:'  II:  s  II  W  II2  II  BZ'  II:  ^  f  II^V  H^.    o 


To  find  an  appropriate  W,  we  first  define  the  generalized  inverse  <!)/  of  4>^  as 

*;  -  diag  (  4»r' 4)„-/-i ,  1  )•       ^  (3.5.3) 

Define  also  the  extension  Tq  of  T,  as 

To  -  diag  (  -yi 7„,-i  ,  1  )•  (3.5.4) 

We  note  that  <!>,*  and  Fg  are  nonsingular,  and  by  (3.4.43) 

4),  4>;  To  =  r,.  (3.5.5) 

Let  us  define  B^hy 

Bo  -  B,  4);  To.  (3.5.6) 

We  establish 

LEMMA  3.5.3 

For  all  a  >  1,  there  exists  an  0  <  //j  s  min  {  //;,  Hi  },  such  that  for  all  h^,  h^  <  Hy, 
we  have 

\rJ<B,  +  Bl)^C-hl.  O.S.T) 

EmaL         By  (3.4.15),  (3.5.5),  (3.4.13)  and  (3.4.8), 

^2 


=  4>;  To  +  I  ^  [  r,  +  (  P2  -  -i-  )•/  +  G:  Z:  C:  ]  V,. 
2    Xj  p: 


Thus  by  (3.4.12), 


b;  +  Bo  =  2<i.;  To  +  ^  [  r,  +  ( p:  -  -J- )  /  ]  r,. 

^^2  P2 


which  is  a  diagonal  matrix.  Hence  by  (3.5.3),  (3.5.4)  and  (3.4.5) 


-71- 

=  2-    min    { -y ,  •  [^7^  +  t-^  (  P2  "  cos8y )],!}. 
By  lemma  3.4.2,  4),"^  is  an  increasing  function  of  j  for  1  s  j  <  nj,  and  by  (3.4.5)  and 
(3.4.6),  so  are  -yy  and  -cosdy.  Thus 

X^iB'o  +  flo)  ^  2  •  min  { -y  J  4>ri  +  -!-  (  P2  -  cosd: )],!}.  (3.5.8) 

Notice  that  by  lemma  3.4.3,  <{>f  ^  a  CAj,  while  by  (3.1.3)  and  (3.4.6), 

P2  -  cos  01  =  iT^  Af  -  y  P2  '»2  "'■  higher  order  terms.  (3.5.9) 

Thus  when  a  >  1  and  Aj  sufficiently  small,  we  have  <J)f  ^  +  t—  (  P2  ~  cos  6j)  >  C-A2  >  0. 

^2 

This  implies  that  X^(flJ  +  Bq)  a  C-yi  A2.  Hence  by  (3.4.41),  we  get  (3.5.7). 


For  a  =  1,  by  (3.4.42)  and  (3.5.9),  we  see  that  for  all  sufficiently  small  A2,  the  right 
hand  side  in  (3.5.8)  is  positrve  if 

2  (  VT  +  -^  )-^  >  ^  P2  ^-  (3.5.10) 

Thus  we  have 

Corollary  3.5.4 

If  a  =  1  and  (3.5.10)  holds,  then  there  exists  an  0  <  II^  =s  min  {  H^,  Hi },  such  that  for 
all  Ai,  A2  <  //j,  (3.5.7)  holds,     d 

We  remark  that  if  X^  =  X2,  p-i  =  pl2  and  A^  =  Aj,  then  (3.5.10)  holds  for  all  0  s  ^2  ^  ^• 
By  (3.5.3)  and  (3.4.34), 

II  «!>;  II2  s     max     {  4,-1  .  1  }  s  {  p,  +  X  +  2  +  -^  }  <  C. 

l*y<"j        ^  P2  ?  Pi 

Moreover,  by  (3.5.4)  and  (3.4.28), 

II  To  II2  ^  P2  +  —  +  2  <  C. 
P2 
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Thus  by  (3.4.19)  and  (3.4.35), 

II  4>;  To  11:  <  C.  (3.5.11) 

Hence  by  (3.5.6),  (3.5.7)  and  lemma  3.5.2,  we  get 

LEMMA  3.5.5 

Assume  that  either  a  =  1  and  (3.5.10)  holds  or  a  >  1,  then  for  h^,  hj  <  H^,  we  have 

\\Br'\\2^C-nl    a  (3.5.12) 

Combining  this  result  with  lemma  3.5.1,  we  obtain 

THEOREM  3.5.6 

Let  k(Bj)  and  k(B)  be  the  condition  number  of  B,  and  B  respectively.  Assume  that 
either  a  =  1  and  (3.5.10)  holds  or  a  >  1,  then  for  h^,  hj  <  H^,  we  have 

k(B,)  s  Cn^  (3.5.13) 

and 

K(fl)  s  Cnl    a  (3.5.14) 

This  theorem  and  (2.2.4)  suggest  that  the  convergence  rate  of  the  ordinary  conjugate 
gradient  method  (2.2.1),  when  applied  to  the  normal  equations  corresponding  to  B  or  Bj, 
may  be  ertremely  slow.  However,  in  9  3.8,  we  will  show  that  the  method  converges  quickly 
as  a  consequence  of  a  clustering  of  the  singular  values.  To  do  so,  we  need  more  information 
about  the  matrix  <Pj  and  Z,.  For  simplicity,  we  will  also  reduce  the  dimension  of  the  problem 
further  to  n2-l. 
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i  3.6         More  Matrix  Idectltks  and  the  Approzlinatlon  of  4> 

In  this  section,  we  prove  three  lemmas  that  are  the  keys  in  proving,  in  the  next  section, 
that  the  singular  values  of  B  are  clustered.  The  first  two  lemmas  summarize  all  the  matrix 
identities  we  will  be  using.  The  last  one  gives  an  asymptotic  approximation  of  the  matrix  4>^ 
when  n,  are  large. 

Define  the  projection  Q^,  from  C2  **  ^^  "2  ^y":"  ^  matrix 

Qp-[li <I.,-i],  (3.6.1) 


where  ^,  are  the  i-th  column  of  Q2.  Define  also  the  n2- 1  by  ^2"  1  matrices 

<D^  -  diag  (  4>i 4>„^_i ), 

Tp   -  diag  (  7i .  .  .  .  .  7„^_i ), 


Li  -  L2, 


(3.6.2) 
(3.6.3) 

(3.G.4) 

(3.6.5) 
(3.6.6) 


and 


L,    - 


1 

0 

0 

0    0 

0 

•  0 

0 

0  0 

P: 

P2 

-1 

(3.6.7) 


L,    -(P-Lo  +  ^o*, 


(3.6.8) 


We  remark  that  L^  and  L2  are  matrices  of  rank  1,  while  Lq,  Ly  and  L^  are  matrices  of  rank  2, 
3  and  4  respectively.  Moreover,  these  matrices  are  symmetric.  We  note  that  by  (2.3.23)  and 
(2.3.21). 


(  Li  )t^  =  .^  -^  sin  ^2j.  sin  il':,/  =  ("  I)**"'  (  ^2  )kj. 


\-&k  ,j  <  n^.     (3.6.9) 
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LEMMA  3.6.1 

The  following  identities  hold  for  all  pj  >  0. 

(1)  /?;/?,  =  P2G,  +  L3. 

(2)      io  =  jo;(z;  +  zjo^, 

(3)  R:Q,-'e„^  =  0„^, 


(4)  QlZ,Qi<^,= 


(5) 
(6) 


Q'i2,Q^V,   = 


=  L,-^{  4>,  g;z,  0,  +  q;z:q,  4>p  }. 


(3.6.10) 
(3.6.11) 
(3.6.12) 

(3.6.13) 
(3.6.14) 

(3.6.15) 


Proof:  (1)  and  (2)  can  be  shown  by  straightforward  computations.  To  prove  (3),  we  note 
that  by  (2.3.23),  C:  ^'n,  =  Jj  Ij  =  «:  (  1  .  P:  >  •  ■  •  •  P:'"  )'•  where  oj «»  given  by  (3.1.13). 
Thus  by  (3.4.1),  (3)  follows.  To  prove  (4),  observe  that  by  (3.4.43),  Ql  Z,  Q2  <1>^  ^e„  =  0„  . 
Notice  also  that  by  (3.4.13),  (3.4.8),  (3.6.12)  and  (3.4.43) 

'\  Ql  Z,Q,<P,  =  {2-  'e'  Ql  R,  Q,  -  'e'  (  p,  -  -L  )./  _  i,'  y,  }  4., 

=  (  7-  -  P2  )  •  'S  «».  =  \ 

The  proof  of  (5)  is  similar  to  that  of  (4).  (6)  follows  from  (2)  and  (3.6.8).     n 


For  i  =  1,2,  define  the  ("2-1)  by  (nj-l)  matrices 


(-1         ^ 

where  (  ^q„  ^  ),  given  by  (2.3.23),  is  the  (n^,/)  entry  of  Q^  and 


(3.6.16) 


1i  = 


f7(Pi+  — -2cos-^)  l=s/<„,, 
fc  Pi  "1 


/  =  n, 


(3.6.17) 
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Here  7^^  and  {  given  by  (2.3.24)  and  (3.4.18)  respectively.  We  have, 

LEMMA  3.6.2 

Wi  and  W2  are  positive  semi-definite  matrices  and  satisfy 

(1)  iW,),j  =  (-iy-JiW,),j,  l^kj^nj,  (3.6.18) 

(2)       w,-w,  =  l{  *,  q;  z,  q^  +  q;  z;  q^  <i>;  }.  (3.6.19) 

Proof:  To  prove  that  the  W,  are  positive  semi-definite,  we  first  note  that,  the  L,  are 
symmetric,  and  the  (  F^  -f-  7/7  ),  Is  /  <  rt|,  are  diagonal.  Thus  the  W,  are  symmetric.  Next 
we  observe  that  by  (3.6.17)  and  (3.4.22), 

7,  a  1  (  2  -  2  cos  -^  )  =  -f  sin2  -^  >  0,  1  is  l<  n^.  (3.6.20) 

i  "it"! 

Hence  for  all  x  6  R'''~\  by  (3.6.16),  (3.6.4)  and  (3.6.5), 

x'W,x=^j:y,i  'q   J  )M  '*  (  r^  +  yrl  r'  Q;  \  Y  a:  0,  (3.6.21) 

X- tv: X  =  P2  2 -y/ ( \j )M x- ( r^  +  7,/ )-'q; \ ]' ^ 0.  (3.6.22) 

Thus  the  W,  are  positive  semi-definite. 

To  prove  (1),  we  observe  that  by  (3.6.16), 

(  Wt  \j  =  (  L,  \j  2 '-^-^ 7—,    1  =s  t,  y  <  flj,    ,•  =  1,  2.      (3.6.23) 

'-1  (  "Yt  +  7/  )  (  7y  +  "Y;  ) 

Thus  by  (3.6.9),  (1)  foUows. 

Finally  we  prove  (2).  By  (3.6.13)  and  (3.4.12), 

Q'p  ^'  Qp  ^P  +  ^P  Qp  2'  Qp  =  0-  (3-6.24) 

Thus  for  1  s  /  s  n^,  by  (3.6.11),  we  have 

C2;z.e,  (r,  +  7,/)  +  (r^  +  "yr/)e;z;Gp  =  y,{Q;(z,  +  z:)q^  } 

=  2  7,  Lq.  (3.6.25) 

Notice  that  by  (3.4.41),  7y  >  0  f or  1  s  >  <  hj.  Thus  F^  is  positive  definite.  By  (3.6.17)  and 

(3.6.20),  we  see  that  7,  a  0  for  1  s  /  s  n^.  Hence  (  F^  +  7,7 ),  1  s  /  s  n^,  are  positive 
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defmite  and  hence  invertible.  Thus  (3.6.25)  gives 

ir^  +  yrn-'Q;z,Q,+  c; z; g,  ( r^  + -J,/ )-i 

^2y,ir^-^yrn-'Loir^  +  yrn-K  l^l^n,.  (3.6.26) 

Recall  that  by  (2.3.16),  (2.1.14)  and  (2.1.15), 

Thus  by  (3.4.14),  (3.4.4)  and  (3.6.17), 

*.  =  S  ( \j  )H  r,  +  -J,  •  /  )-^  +  ( \^, )'  r;. 

(-1 
Restricting  to  the  (n2-l)-diniensional  space,  we  have,  by  (3.6.2)  and  (3.6.3), 


/-I 

Since  -y,^  =  0, 


<Pp  =  S(V;)'(r,  +  7;/)-^  (3.6.27) 

Thus  multiplying  (3.6.26)  by  (  ^q„  j  )^  and  taking  the  sum  from  /=1  to  n^,  we  get 

<t>p  Q;  Z,  Qp  +  Q;  z;  O;,  <I>;  =  2  2   -T:  (  '<?«,;  )H  ^,  +  ■/;•/  )-'  ^0  (  r^  +  7/-' )"' 

=  2  {  Wi  -  ^2  }, 
where  the  last  equality  follows  from  (3.6.6)  and  (3.6.16).    n 


Finally  we  give  an  approximation  4>-  of  <!>..  According  to  (3.4.23),  we  define 

*^  =  diag(<^i .^„^.l),  (3.6.28) 

where  ^j  is  the  first  term  of  (3.4.23),  i.e., 

^   =  ''^  ~  °/ .  1  s  ;•  <  nj.  (3.6.29) 

^         PilTy 

Define  also  the  constant 
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where  Cq  is  given  by  (3.1.6).  We  remark  that  Iq  depends  on  h,,  j  =  1  and  2.  However,  by 
(3.1.8),  there  exists  a  C^,  independent  of  h^  and  A2,  such  that 

{0  <  Ci.  (3.6.31) 

LEMMA  3.6.3 

For  all  a  &  1,  there  exists  anO  <  H^<  H^  and  C  >  0  such  that  for  A^  and  Aj  <  H^  and 
Ci  log  rtj  ^  7*  <  "2'  ^^  ^^^^ 

(1)  ay'''<C/n|,  (3.6.32) 

(2)  0<<^j-  4>j<  -f--  (3.6.33) 

Proof:         We  prove  (1)  first.  Observe  that  by  (3.4.21),  (3.4.22)  and  (3.4.41), 

"2 
Hence 

((u2-4)^>4Vf;/n2,  1  =s  ;  <  n^. 

Thus  (3.4.20)  gives 


ij<2(2  +  4{'i\  +  4Vl±)-i<(i  +  VlJ.  )-2^   l^j<  n^.      (3.6.34) 


If  ;•  2:  -^,  then  (3.6.34)  and  (3.1.6)  give 


'{ 


MY  My 


If  7  <  -^,  then  (3.6.34)  and  (3.1.6)  give 


"2 
By  (3.6.31),  we  have 
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^f.^r'j'^.-oi.^,      (oTi^j<^. 


Thus  for  A;  and  A2  sufHaently  small, 

af^  s  «"'^'S  for  1  s  j  <  ^.  (3.6.36) 

Hence  if  —7-  >  y  a  C^  log  Hj. 


2";^      -^c;^""'  -31o,«,  ^      1 

"5 


(3.6.37) 


Combining  this  with  (3.6.35),  we  get  (1). 

To  prove  (2),  we  observe  that  by  (3.4.23)  and  the  definition  of  ^j,  we  have 

2», 


^j-^j=  ^^-^;;-  ^.  1  ^  J  <  "2-  (3-6.38) 

'y  (1  -  ay    ^ 


p,  a,(l-a;^0     ^^ 


Since  Oj  <  1,  this  impUes  ^j  -  ^j>  O.By  (3.4.31),  (3.4.30),  (3.4.19)  and  (3.4.35), 

^  s  p.  +  -i-  +  {  (  p2  +  ^  +  2  )  <  C,  1  s  y  <  112.  (3.6.39) 

Oj  Pi  P2 

Thus  from  (3.6.32),  we  sec  that  for  h,  <  H^  and  C^  log  n^^  j  <  t:, 
1  -  aj  \-¥  Chi 


s  C,  for  Ci  log  /12  ^  ;  <  "2 

Hence  by  (3.6.32)  and  (3.4.41),  (3.6.38)  becomes 

In 

V         lj  "i       )'  "2 


/I  ^"w  Pi  '^t 

Pi  fly  (1  -  fly   )  "^ 


C  a,  ^         r  r 

^j  -  ^j  s        /     a      ^  ^  s  -p— ,  for  Ci  log  n^^  j  <  tj.     o 
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i  3.7        Clnsteiiiig  of  Singular  Valnes  of  B 

Using  the  lemmas  in  the  previous  section,  we  are  able  to  prove  that  the  singiilar  values 
of  B  are  clustered.  More  precisely,  we  will  show  that  B'  B  =  (1  +  —^)  I  +  L  +  U,  where 

L  is  a  matrix  of  low  rank  and  1/  is  a  matrix  of  small  I2  norm. 
Recalling  (3.4.15)  and  using  (3.6.10),  we  have 

b:b,  =  I+  ^{Q'2R,Q2^,  +  '^,Q'2R:<i>,} 

+  P2  (7^)'  *,  Ql  G.  Q2  <l».  +  (■^)^  *.  O2  L,  Q2  <P,.  (3.7.1) 

*2  ^2 

For  simplicity,  we  will  use  Lj  to  denote  matrices  of  rank  less  than  or  equal  to  j.  Using 
(3.4.13)  and  (3.4.8),  (3.7.1)  becomes 

b;  fl,  =  /  +  -^  { r,  +  ( pj  -  -J- )  /  +  P2  (■^)  (!>,  r, }  4., 

*2  P2  *2 

+  y  (^)  {  G2  z,  C?2  4>,  +  4>,  G2  2>. }  +  A 


where 


-D  +  W  +  L3, 


O  -  /  +  7^  {  r,  +  (  P2  -  -J-  )  /  +  P:  (■^)  <!.,  r, }  <D, 

^2  P2  *2 


is  a  diagonal  matrix,  and 


W'  -  I  (■^)  {  Ql  Z,  Q2  «!>,  +  «D,  Ql  z;  4),  }. 


2  'X 


By  (3.6.13),  W  = 


,  where 


w,'\  (^)  {  q;  z,  c,  <!>,  +  4)^  q;  z;  <d,  } 


(3.7.2) 


(3.7.3) 


(3.7.4) 


is  a  matrix  of  order  nj-l.  By  (3.6.15)  and  (3.6.19), 


^^ '  IT  ^  ^* "  2  ^  *^  ^^'  ^'  "^^  +  q;  z;  g,  *, ) } 


^2 


(3.7.5) 
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By  (3.4.43),  D  = 


^P     \-i 


o;-i  1 


Here  D.  is  a  diagonal  matrix  of  order  n2- 1  and  is  given  by 


D^  -  diag  ( <ii d„^.^ ) 

Combining  this  with  (3.7.5),  (3.7.2)  gives 


(3.7.6) 


b:b,= 


+  L, 


D,  +  j^iW.-W,)  0„^_i 


o;-. 


+  L. 


(3.7.7) 


Corresponding  to  (3.6.28),  we  define  the  approximation  D.  of  D   as 


D^  -  diag  (  Ji (i„^_i  ) 


\,  '^^ 


P2 


P     P  '      P' 


(3.7.8) 


LEMMA  3.7.1 


For  o  2  1,  there  exists  a  Cj  >  0  such  that  for  A,  <  Ht,  i  =  1,2, 


Here  C;  and  Hi  are  given  by  (3.6.31)  and  lemma  3.6.3  respectively. 


(3.7.9) 


Proof:         By  (3.7.6)  and  (3.7.8), 


dj-dj={-^){lj^{?^--^)I  +  p^  (r^)  -ry  «t>y  }  (  <|)y  -  *y  ) 


P2 

+  (t^)^  P2  7y  <^y  (  <}>y  -  «^y  ).  1  =S  7  <  n^. 

*2 

Notice  that  by  (3.4.40)  and  the  fact  that  oj  <  1,  we  have 

1j  *y  =S  (C  7y  )*  +  «r^  1  ^  7  <  "z- 

Thus  by  (3.4.28),  (3.4.19)  and  (3.4.35), 


(3.7.10) 


(3.7.11) 
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7;  4»y  =s  C,  1  s  y  <  n^.  (3.7.12) 


Hence  by  (3.6.33), 


yj  ^j  s^j^j^  C,  Ci  log  n2^j<  nj.  (3.7.13) 

Thus  by  (3.1.8).  (3.4.28)  and  (3.6.33),  (3.7.10)  gives 

C  h 

\dj-dj\<C\^j-^j\<  -^.  C;  log  n^rSjK  n2.   a 


LEMMA  3.7.2 

For  a  a  1,  there  exists  a  constant  Cj  >  0  such  that  when  h,  <  H^,  i  =  1,2, 

\dj-(l+^)\<C,  hr'  Ij  l^j<  «2-  (3.7.14) 

Proof:         By  (3.7.8)  and  (3.6.29),  we  have 

^  ^2        Pi^J  ^  P2  ^2  Pi 

By  definition  (3.4.18),  {  =  ^^.  Thus 

P2^1 

'  ^2    Piyj  p2  5 

,    ,     ^1     ,     ^1    (Pl-  fly)    r       1°L1L  J.  r  f  1    ^  -L  /-  M 

=  1  +  —  +  - —      /    { -— -frr  +  ?  ( P2  -  — )  +  ( Pi  -  fly  )  }• 

^2  ^2         Pi  ^J  i  Pi        fly  P2 

By  (3.4.24),  this  can  be  rewritten  as 

/-  1  J.   ^1  ^      j  -   ^1        fly       /  l~Pifly    ,   r  /-    1  \  J.       I 

(  1  +  —  )  -  <f,  =  —  T— f—  {  — — ^  +  ?  (  —  -  P2  )  -  Pi  +  fly  } 
Xj  X2    1    Pifly  Pi  P2 


=  T^  T:^  {  (  -T  -  Pi  )  +  5  (  f  -  P2  )  }.  (3.7.15) 

\2     1     Pifly  Pi  P2 

Since  Oj,  p^  <  1, 

0  s         ''Z         s  — ^,  1  s  J  <  nj.  (3.7.16) 

1  ~  Pi  fly       ^  ~  fly 

Hence  we  have 

I  '/y  -  ( 1  +  r- )  I  ^  -r  rV  i  ( 7-  -  Pi )  +  c  ( -f  -  p^ )  i-  1  ^  j  <  "2-     (3.7.17) 

Xj  X2      l-fly  Pi  P2 


rf^.(l+iL)|^_EL!!^+    ^^^'~' 


Here 


and 


Hence  we  have 


Corollary  3.7.3 

For  a  a  1  and  A,  <  ^^4,  i  =  1,2, 


^2 


(3.7.18) 
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Notice  that  by  (3.4.39)  and  (3.4.41), 

On  the  other  hand,  by  the  definition  of  {  and  pj, 

l(^-Pi)  +  {(f-P2)l=7-l(i-pn  +  -^(i-pi)l 
Pi  P:  Pi  J'-i 

=  — 13:*?  +  — P:*?!.  (3.7.19) 

Pi  M-i 

Thus  by  (3.1.6),  (3.1.8)  and  (3.4.35),  we  have 

^  I  (  —  -  Pi)  +  U  7-  -  P2 )  I  ^  C  •  A?.  (3.7.20) 

^2         Pi  P2 

Hence  the  lemma  follows  by  putting  this  and  (3.7.18)  into  (3.7.17).     a 


Qjmbining  lemmas  3.7.1  and  3.7.2,  we  have,  for  A,  <  Hi,  i  =  1,2, 


^2  (Ci  log  nj)^        Ci  log  nj 

=  C4  Aj'  /  log  '»2«  f°^    Ci  '°8  "2  ^  y  <  '»2-  (3.7.21) 

tto  -  min  {  o  -  1  ,  1  }.  (3.7.22) 

^*  -  7-  (r  1^1.    -^  ^3).  (3.7.23) 

Ci     Ci  log  n2 


C;,  =  (  1  +  ^  )■/  +  Lc.ioin,  +  ^p.  (3.7.24) 


where  U^  is  a  diagonal  matrix  with 

II  t^Jl2  <  C4  a"»  /  log  nj.    a  (3.7.25) 
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Nezt  we  claim  that  the  eigenvalues  of  Wi,  i  =  1,2,  and  hence  of  W^,  are  clustered 
around  zero.  From  lemma  3.6.2,  the  W,  are  positive  semi-defmite  matrices.  Thus 


triW,)'  "Z  (Wt)jj   =  I,  h(Wi)^0.  «•  =  1,  2.  (3.7.26) 

where  X,  (  W, )  a  0  are  the  eigenvalues  of  Wi. 


LEMMA  3.7.4 

For  a  a  1,  there  exists  a  C5  >  0  and  0  <  i/j  s  H^,  such  that  for  all  h^  and  Aj  <  H^, 

tr(W,)<Cs  log  nj,  i  =  1,  2.  (3.7.27) 

Proof        By  (3.6.23)  and  (3.6.9) 

(  W'l  )yy  =  (  ^2  )yy  =  "T  7~  '^'  '''2^  2:  ,  -'  ^"\,>   1  ^  ;  <  "2-        (3.7.28) 

Using  (2.3.21)  -  (2.3.24)  and  (3.6.17),  it  is  easy  to  verify  that 

w       si        12.1  2    *i^^  9ij 

(\j)'  =  ^  T-  sm^  'i'l;  =  T-  — T^.  1  ^  /  <  ni- 

Pi    "1  "1     Pi  C  7; 

Thus  (3.7.28)  becomes 

S  Pi  P2  "i":  ;-i  (  7;  +  7y  )•' 

By  (3.4.41),  yj  a  ■^.  Similarly,  by  (3.6.17),  7,  a  -i  -^.  Using  these  and  the  fact  that 
sin  61;  =  sin  -^  ^  —  and  sin^  \\i2j  s  1,  (3.7.29)  becomes 


Ljr}         "i 


(4-)' 


(  ^'  )yy  ==  T-^ 2 : — ■ — ; .         1  ^  ;  <  "2.       (3.7.30) 

"2  "1 

Let  yj  =  {  (-^)^.  Consider  the  function  /,(x)  -      ,        ,     .  The  maximum  of  /,(x)  for 
"2  ^  (y/  -f-  x^)^  ^ 

X  a  0  is  at  a:  =  >y  where  /y(>'y)  =  (2  >y)"^  Hence  for  I  ^  j  <  /12. 
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( "^i  )}J 


L^ 


4  pi  P2  n^ 


f — dx  +  —  max  — 


4  Pi  P:  "2    I  4  ^'y        2  rti  y/  J 


4  pi  P2  n2 


n,  IT 


ni 


4;V^  ^  l^nj^ 


16piP2>        8piP2;^' 


where  Cq  is  given  in  (3.1.6).  Hence  for  i  =  1,2, 


tr{Wi) 


loPiP:  y-i 


8  PiP:  y-i 


16  Pi  P2 


(  log  n2  +  TT  )  + 


^0^        V   ,-2 

0P1P2  y-1 


where  -y  is  the  Euler  constant.  Using  the  Euler  formula 


we  have 


rr  (  W,  )  s 


lopi  P2  I. 


log  /12  +  "y  + 


3  r 


»  =  1,2. 


Thus  there  exists  a  Cj  such  that  (3.7.27)  holds  for  all  sufficiently  large  n^.    a 


(3.7.31) 


From  this  lemma,  we  see  that  the  number  of  eigenvalues  of  W,  that  are  larger  than  1 
cannot  exceed  C5  log  n^.  Since  the  W,  are  symmetric,  they  can  be  diagonalized.  By  separating 
those  eigenvalues  that  are  greater  than  1  from  those  that  are  smaller,  we  have 

CoUorary  3.7.5 

If  a  &  1  and  h^,  h^  <  H^,  then 


Wi  -  Vi  +  Lc^iogn^,  »  -  1,2. 


(3.7.32) 
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Here  the  V,  are  positive  semi-definite  matrices  with  ||  V,  Hj  <  1,  or  equivalently, 


x'V,x       ^ 
0  s  — -i-  s  1, 


for  X  €  R'''~\ 


X    X 


and  the  L'c  iq.  „  are  positive  definite  matrices  with 


(3.7.33) 


(3.7.34) 


Define  the  constant 


Cj  -  2  Cj  +  Ci. 


(3.7.35) 


By  (3.7.24)  and  (3.7.32),  we  get 


D,+  ^(W,-W,)='I+^(I  +  V2-V,)  +  Lc^,,^„^  +  Up,        (3.7.36) 


where 


*"C,logn,         *-Cjlog»ij        ^Cjlogn,  ^  ^-C^log/ij- 


(3.7.37) 


Using  (3.7.33),  we  see  that,  the  eigenvalues  of  /  +  — ^  ( /  +  V2  -  V^ )  lie  in  the  interval 
[  1  ,  1  +  2  -i-  ].  Thus  by  (3.7.7)  and  (3.7.25),  we  have 


Corollary  3.7.6 

If  a  a  1  and  h,  <  H^,  i  =  1,2,  then 


Here 


is  symmetric  with 


B:B,  =   V,  +   U,  +  Lc.iog„,  +  8- 


V,' 


/  +  ^  (/  +  V2  -  VJ  0,^.1 


"«,-! 


1+    T^ 


(3.7.38) 


(3.7.39) 


x/  V, )  c[  1 . 1  +  2  r-  ].        1  ^  ;•  s  «2. 


(3.7.40) 


Us 


86 


o;-i  0 


(3.7.41) 


is  diagonal  with 


and 


U,  \\2  <C^  hi'/ log  n^,  (3.7.42) 


£c.io,n,*8-^'c.iog«,  +  A-    °  (3-7.43) 

From  (3.1.8),  (3.7.40)  and  (3.7.42),  we  see  that  there  exists  d^  and  ^fj  >  0,  independent 
of  hf,  such  that  for  all  h,  sufficiently  small,  all  the  eigenvalues  of  V^  +  t/,  lie  in  the  interval 
[di,d2].  Notice  that  the  matrices  V,  and  U,  are  symmetric,  hence  Lc.iog/.,+8  •*  ^^° 
symmetric.  Qearly,  Lc  [o,„  +8  can  be  written  as  a  sum  of  (  C^  log  n2  +  8  )'s  symmetric  rank 
one  matrices.  Thus  using  the  Cauchy  interlace  theorem  (see  Parlett  [34])  repeatedly  for 
(  C^  log  nj  +  8  )  times,  we  get 

Corollary  3.7.7 

For  a  a  1,  there  exists  d^,  rfj  >  0.  and  Hf,  <  H^  such  that  for  Ai,  *:  <  ^5.  all  except 
(Cg  log  ni  +  8)  eigenvalues  of  B]  B,  lie  in  the  interval  [  </i ,  t/j  ]•     d 

Using  lemma  3.3.2,  we  immediate  get 

Corollary  3.7.8 

For  a  2  1,  there  exists  b^  and  b^  >  0,  independent  of  A,,  such  that  for  A^,  hj  <  Hf,,  all 
except  (Cg  log  n-^  +  8)  eigenvalues  of  fl*  fl  lie  in  the  interval  [  i»i  ,  *:  ]•     '-' 

Using  this  corollary,  we  are  able  to  derive  an  upper  bound  for  the  number  of  iterations 
required  to  attain  a  given  accuracy. 
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i  3.8         The  Rate  of  CooTersencc  when  a  ^  1 


In  this  section,  we  will  show  that  the  number  of  iterations  required  to  attain  a  given 
accuracy  increases  no  faster  than  OOog^/ij).  We  first  derive  an  upper  bound  for  (2.2.3). 


LEMMA  3.8.1 

Let  X  be  the  solution  to  B'  B  x  =  B*  b  and  Xj  be  the  j-th  iterant  of  the  ordinary 
conjugate  gradient  method  (2.2.1)  applied  to  this  normal  equation.  If  the  eigenvalues  {  8y  }  of 
B'  B  axe  such  that 


0<  8 


s  8-_i  <  8    -  *i  5    ••  •    rs  h         -  fej  <  i  rs 


1-1 


Fl,-flH 


\' 


then 


B(x-x,) 


■J)  112 


Bix-xo)  lb  [b  +  1 


b-1 


J-p-1 


max 


(3.8.1) 


Here 


(bA 


^*W 


a:  1.    n 


(3.8.2) 


The  proof  can  be  foimd  in  Van  der  Vorst  [39],  see  also  Daniel  [13]. 
Notice  that  for  n2-q  +  l  s  j  s  nj  and  8  €  [  6^  ,  ij  ],  we  have. 


8^-8 
8, 


:s  1  - 


^^1. 


(3.8.3) 


Thus  (3.8.1)  can  be  rewritten  as 


|£(x- 
\B(. 


(3.8.4) 


From  lemmas  3.5.5  and  3.3.2,  we  see  that  ifo>lorifa  =  l  such  that  (3.5.10)  holds,  then 


8,  a  CAf ,  1  s  y  s  nj 


(3.8.5) 


Thus  f or  1  s  y  s  p  and  8  €  [  fr^ ,  frj  ],  we  have. 


Hence  (3.8.4)  becomes 


Bix-xj)  Ilj 


ll^(^-Xo)ll2 

Since  log  (1-x)  s  -x  for  0  s  x  s  1,  we  have 
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8-8, 
0  s  —r-^  s  C  nf.  (3.8.6) 


c.  „|. .  [i  - -l-p"  =  c^  „^  . /--'^^'^^-i^^. 


l|5(x-xo)  11:  ^ 

Thus  given  arbitrary  e  >  0,  an  upper  bound  for  the  number  of  iterations  required  to  make 


s  e 


l|fl(x-X,)||; 
\\B(x-Xo)\\2 

is  given  by 

;o  -  ^Y'  ^  '''°*  ''  "^  ^  ^^°^  /i2  -  log  c  }  +  p  +  <7.  (3.8.8) 

Notice  that  by  corollary  3.7.8,  6  =  -r-  is  independent  of  nj.  Moreover,  we  have 

p  +  ^  s  Cj  log  rt2  +  8  =s  (2  C5  +  C  J  log  /I2  +  8.  (3.8.9) 

Using  (3.7.34)  and  (3.7.37),  we  also  have 

p  ,q  ^  iCs  +  CJ  log  nj  +  8.  (3.8.10) 

Thus  the  right  hand  side  of  (3.8.8)  is  bounded  above  by  C  •  log^n^.  Hence  we  have, 

THEOREM  3.8.2 

Assume  that  either  o  =  1  and  (3.5.10)  holds  or  a  >  1,  then  the  nimiber  of  iterations 

\\B(x-xA\U  ,    ^ 

required  to  reduce  -tt-tz \  1,    by  a  given  accuracy  can  grow  no  faster  than  0(log^n;)  as  h, 

\\B{x-xq)  II2 

tends  to  zero.    □ 
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}  3.9        CoDclodlng  Remarki 

In  this  section,  we  will  generalize  the  results  in  the  previoiis  sections  to  the  case  where 
X,  a  jji,.  We  will  also  discuss  the  multi-server  case  where  s,  >  1.  Let  us  begin  with  the  case 
where  X,  >  jt,. 

i  3.9.1         The  Case  when  X,  >  it.,  and  a  2  1 

Going  over  the  proof  in  5  3.3  -  5  3.8,  we  see  that  when  X,  >  ji,,,  some  of  the  arguments 
are  no  longer  valid  or  need  further  explanations.  They  are  (3.3.2)  •  (3.3.6),  (3.4.24), 
(3.4.25),  (3.4.33),  (3.4.38),  (3.5.9)  and  (3.7.17).  We  will  see  in  the  foUowing  that  these 
equations  can  easily  be  modified  to  prove  the  same  assertions.  The  idea  is  to  expand  p, 
around  A,  =  0.  Let  us  assume 

^  =  1  -K  p,  A,»  >  1,  (3.9.1) 

where 

a  a  1  and  0  <  p,  s  1.  (3.9.2) 

Here  a,  p.,  and  p,  are  assimied  to  be  constant  independent  of  h,.  By  defmition  (3.1.12) 

P/  =(  —  )'*=(  1  +  P,  A,"  )'*  (3.9.3) 

and  hence 

p,'      =  e^'-  rse^^'"'      s  (  1  -  H  p,  hr'  V^-  (3.9.4) 

In  the  following  paragraphs,  we  will  replace  or  give  another  proof  for  the  equations  listed 

above. 

First  we  observe  that  by  (3.1.11)  and  the  fact  that  p2  >  1,  (3.3.2),  (3.3.3)  and  (3.3.5) 
can  be  replaced  respectively  by 

\\o{'S2Q2\\2  =  pI'~\  (3.9.5) 
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a:  Ql  SI'  II2  =  1.  (3.9.6) 


and 


P2""'  aj^UjS  P2'  ^  <Ty,  1  s  j  s  flj-  (3.9.7) 

By  (3.9.4)  and  (3.9.7),  (3.3.6)  can  be  replaced  by 

(  1  -    %  P,  hr'  )  <Ty  S  dy  S   (  1   -   14  p,  A,»-l  )-l  (Ty,  1  «  ;   S  Hj.  (3.9.8) 

In  particular,  (3.3.7)  still  holds. 

p,    —   Of 

Next  we  claim  that  — = ^  in  (3.4.24)  is  a  decreasing  function  of  ;  f or  1  s  7  <  nj. 

Pi7y 

Since  p^  >  1  >  Oj,  the  left  hand  side  in  (3.4.24)  is  positive.  Hence 

l>Pifly,  ltfi></i2-  (3.9.9) 

Since  Oj  is  a  decreasing  function  of  j,  we  see  that  the  right  hand  side  in  (3.4.24)  is  a 

decreasing  function  of  ;. 

1  -  a, 
Similarly, ^  in  (3.4.25)  is  also  a  decreasing  function  of  j.  In  fact,  by  (3.4.24) 

l-aj_  ^  PTZl  ^  Ll£1^  =  ElZl  +  _Li_.  (3.9.10) 

fy  Pi  yj  Pi  1j  Pi  Ij  Pi  -   Oj 

Since  p^  >  1  >  Oy  and  Oj,  yj'^  are  decreasing  functions  of  j,  the  assertion  follows. 

The  first  inequality  in  (3.4.33)  follows  from  (3.9.9)  and  (3.4.32). 

Instead  of  getting  an  upper  bound  for  pf^  in  (3.4.33),  we  now  need  an  upper  bound  for 
p,.  By  (3.9.3)  and  (3.9.2), 

p,  s  (  1  +  p,  A,-  )  =s  2,  i  =  1,2.  (3.9.11) 

Next  we  modify  (3.4.38).  Notice  that  by  (3.9.3),  Pi  <  1  +  0(A,»).  Thus 

^         '       ^       +  Oih?),  l^j<  1:.  (3.9.12) 


1   -    Pl  fly  1    -    fly 

Hence  (3.4.38)  can  be  replaced  by 


<|»y  <  Tr-^  +  -^  +  Oihn,  1  s  y  <  nj.  (3.9.13) 

■^    ";     yj  "1 

It  is  obvious  that  the  extra  term  0(hf)  does  not  affect  the  conclusion  of  lemma  3.4.3. 
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Since  pj  >  1  >  cos  e^,  thus  (3.5.9)  is  positive  for  all  a  a  1.  Hence  in  theorem  3.5.6  and 
3.8.2,  there  is  no  need  to  assume  that  (3.5.10)  holds  even  when  a  =  1. 

Finally  replacing  (3.7.16)  by  (3.9.12),  we  sec  that  (3.7.17)  can  be  replaced  by 

|rf-(l+^)|^^{  -^J—  +  om  }  I  J-  -  p,  -h  {( J-  -  p,)  |.    (3.9.14) 
It  is  also  clear  that  the  extra  term  0{hf)  does  not  affect  the  conclusion  of  lemma  3.7.2. 
Using  these  modified  equations,  it  is  easy  to  prove 

THEOREM  3.9.1 

If   (3.9.1)   and   (3.9.2)   hold,   then   the   number   of  iterations   required  to  reduce 

- — ^^ ^^-Ir-  by  a  given  accuracy  can  grow  no  faster  than  ©(log^n,)  "  */  tends  to  zero,    p 

\\B(,x-x^)\\^ 

9  3.9.2         The  Case  when  X,  =  p., 

When  X,  =  ji,,,  i  =  1,2,  by  (3.1.12)  and  (3.1.3),  we  have 

p,=  (^)'»=l,  /  =  1.2.  (3.9.15) 

P,  =  0,  i=l,2,  (3.9.16) 

and  a  is  arbitrary.  Moreover,  by  the  definition  (2.1.7)  and  (2.1.8), 


fl/  =  (-)^  '  =  1.2.  (3.9.17) 

Using  (3.9.15)  -  (3.9.17),  it  is  straightforward  to  check  that  all  the  results  in  5  3.3  -  5  3.8  are 
still  valid.  In  particular,  theorem  3.8.2  still  holds  in  this  case. 

We  remark  that  when  X,  =  pi,,  i  =  1,2,  we  are  in  fact  preconditioning  an  oblique  BVP 
by  a  Neumann  BVP.  To  see  this,  we  first  observe  that  by  (2.1.5)  and  the  fact  that  X,  =  jj,,, 

Ao  =  X,  (Gi  ®  I„)  +  \2  (/n,  ®  G2),  (3.9.18) 

where  the  Gi  are  of  order  n,  and  are  given  by 

G,  -  tridiag  (-1,2,-1)-  e, el-  e„^  *;,        i  =  l,2.  (3.9.19) 
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Thus  Aq  is  the  5-pomt  formula  for  the  Neumann  problem 


^'0^^ 


an 


in  [0,lp, 
on  a[0,l]2. 


(3.9.20) 


with  mesh-size  h,  =  (n,-l)"^,  i  =   1,2.  Here  ti  denotes  the  unit  outward  normal  of  the 
square  [C,l]^.  On  the  other  hand,  from  (2.3.6)  and  (3.4.1),  we  see  that 

/?o  =  i\  '<)  ®  '^1  =  i\  '<)  ®  (Xi  •  R,),  (3.9.21) 

where   R,    is    given   by    (3.4.1)    with   P2  =  1    there.    Thus  R,   is   a   forward   difference 

approximation  of  3^.  It  is  easy  to  see  that 


A    —   An   +    An 


(3.9.22) 


is  a  5-point  formula  for  the  oblique  BVP 


x^  i!2.  +  X:  -^  =  0 
37 


in  [O.lp, 
on  a[o,ip. 


(3.9.23) 


Here  7  is  a  directional  vector  defined  on  3[0,1]^  and  is  given  by 


|t1  +  T  if X  =  1, 


(3.9.24) 
where  t  is  the  unit  tangential  vector. 

Thus  the  preconditioning  of  A  by  Ag  is  the  discrete  version  of  preconditioning  the 
oblique  problem  (3.9.23)  by  the  Neumann  problem  (3.9.20).  Since  theorem  .''.8.2  holds  in 
this  case,  the  matrix  Aq  is  a  very  good  preconditioner  for  A .  Moreover,  by  the  resixlts  in  9 
3.7,  we  see  that  the  singxilar  values  of  the  preconditioned  matrix 


A    Aq       —     /     +     «Q   Aq 


(3.9.25) 


are  clustered  around  (  1  +  — ^  )'*. 
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i  3.9.3        The  Molti-Serrer  Case 


For  5,  >  1,  then  instead  of  (2.3.28),  the  underlying  continous  equation  is  of  the  form 


+    2Ai(jiJli-  Xi)p,    +    2h2(s2il2-^2)Py    ~    0. 


(3.9.26) 


in  the  region  where  s^h^:s  x  =  i  h^^  1  and  jj  Aj  ^  y  =  ;  *:  ^  1-  In  other  part  of  the 
square  [0,1]^,  the  equation  has  variable  coefficients.  Thus  if  s,  are  constant  independent  of  A,, 
then  for  sufficiently  small  A,,  a  sensible  limit  to  consider  is 

s,  »i,  =  X,  ±  r\,  h^,  i  =  1,2,  (3.9.27) 

for  some  constants  ti,  and  a.  We  remark  that  when  */  =  1  and  t|,  s  p,„  j  =  1,2,  this  reduces 

to  the  limit  we  discussed  previously. 

When  Si  increases  proportionally  to  n,,  (3.9.27)  may  not  be  the  right  limit  to  consider. 

We  remark  that  in  this  case,  the  diagonal  matrix  5,  that  is  used  in  the  transformation  (3.3.1) 

is  no  longer  well-conditioned.  In  fact,  by  (2.1.7),  the  last  diagonal  entry  of  S,  is  given  by 


'd„.  =  a, 


1    fhiyi  f-hj—\"r'r^ 

S,  I        ill  Si  ill 


I  =  1,2. 


(3.9.28) 


Thus  by  (3.9.27)  and  Stirling's  formula,  we  have,  for  a  a:  1, 


'd„,  =  a, 


(',)'' 


^'* 


s,l 


1     -^s. 


-Cs,*  '   ^    ,       »■  =  1.2.  (3.9.29) 

Thus  the  condition  number  of  5;  increases  exponentially  for  all  a  a  1. 

We  remark  that  the  ordinary  conjugate  gradient  method  (2.2.1)  indeed  converges  within 
a  few  steps  when  *,  =  constant,  while  it  diverges  when  5,  =  n,- 1,  see  8  4. 


This  concludes  our  discussion  on  the  model  problems. 
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Section  4.  Nomerical  Results 


In  this  section,  we  report  on  the  numerical  results  for  the  models  discossed  in  5  2.  All 
the  computations  were  carried  out  on  the  Cyber-760  at  the  Mathematics  and  Computing 
Laboratory  of  the  Courant  Institute.  Single  precision,  between  fourteen  and  fifteen  decimal 
digits,  was  used  throughout.  Craig's  method  used  in  these  computations  is  a  version  of  the 
ordinary  conjugate  gradient  method  (2.2.1)  applied  to  the  normal  equations;  see  Elman  [15]. 
For  a  given  tolerance,  convergence  is  said  to  occur  at  the  ifc-th  step  if 

i^  s  tolerance. 


II  ro  112 

while 


*~^   ^-  >   tolerance. 


II  '•o  lb 
Here  r^,  given  by  (2.2.1)  or  (2.2.5),  is  the  residual  at  the  Jt-th  step  and 


WxWl'^^xj       forall  x6/?- 
The  initial  iterant  xq  is  chosen  to  be  identically  zero. 


(I)         Two-queue,  one-direction  overflow  model. 

This  is  the  model  discussed  in  9  2.3.1.  Iterative  methods  developed  in  9  2.2  are  used  to 
solve  the  matrix  equation  B  y^  =  b.  Here  B  and  b  are  given  by  (2.3.12)  and  (2.3.13) 
respectively.  Let  us  first  consider  the  case  where  X,  <  p.,  and  j,  =  1.  Tables  la,  lb,  2a  and 
2b  give  the  number  of  iterations  required  to  converge  for  two  different  choices  of  3,,  where 
3,  is  the  parameter  defined  by  (3.1.3)  or  (3.9.1).    We  see  that  the  convergence  rate  is 
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mdcpendent  of  p,.  The  results  also  show  that  the  case  where  a  =  1  is  critical  for  Craig's 
method.  More  precisely,  when  a  >  1,  the  number  of  iterations  seems  to  be  constant  as 
n,  -  «.  When  a  =  1,  it  increases  like  0(log  n,).  This  is  consistent  with  (3.7.22).  When 
a  <  1,  Craig's  method  does,  not  converge  for  sufficiently  large  n,.  However,  from  theorem 
3.2.1,  we  see  that  for  sufficiently  large  n„  there  is  no  need  to  solve  the  matrix  equation 
numerically. 

Next  we  consider  the  case  where  X,  a  p.,  and  *,  =  1.  Tables  3a  and  3b  give  the  results 
of  our  method  for  different  choices  of  a.  In  the  tables,  a  =  «  represents  the  problems  where 
X,  =  ii,.  We  see  that  the  convergence  rate  in  this  case  is  almost  the  same  as  in  the  case  where 
X,  <  ji,,.  We  note  that  in  all  the  cases  considered,  our  method  converges  much  sooner  than 
the  bound  given  in  (3.8.8).  Tables  4  and  5  provide  an  explanation.  In  tables  4a  and  4b, 
{8,}?£i  are  the  eigenvalues  of  the  iteration  matrix  B  B'  arranged  in  ascending  order.  We  see 
that  the  8,  are  more  clustered  than  suggested  by  our  results  in  9  3.7.  In  tables  5a  and  5b  we 
give  the  largest  and  smallest  eigenvalues  oi  B  B  '  for  increasing  values  of  n,.  We  note  that 
k(B)  goes  to  infinity  because  of  the  presence  of  some  small  and  some  large  singular  values. 
However,  k(B)  is  much  smaller  than  the  bound  obtained  in  theorem  3.5.6. 

Tables  6a  and  6b  give  the  time  reqiiired  for  the  different  stages  of  the  algorithms. 
"Initialization"  refers  to  the  generation  of  <I>,  5,,  Q,  and  the  right  hand  side  b.  "Iteration" 
refers  to  the  solving  of  By^  =  &  by  the  iterative  methods.  "Generating  p"  refers  to  the 
computation  of  p  in  (2.3.14).  Alternative  C  in  Appendix  A.l  is  used  in  computing  the 
matrix-vector  product  Bd  in  each  iteration.  We  note  that  the  timings  are  consistent  with  the 
theoretical  estimates.  In  fact,  from  9  2.3.1,  we  see  th<;t  the  work  and  storage  requirement  per 
iteration  are  0(n,  log  n,)  and  0(/i,)  repectively.  The  generation  of  p  requires  an  extra 
0{nf  log  n()  work  and  nf  storage.  We  remark  that  substantial  saving  would  result  if  only  a 
few  entries  of  p  are  needed. 

Tables  7a  and  7b  give  the  results  for  a  family  of  multi-server  problems  that  satisfy 
(3.9.27).  As  remarked  there,  when  s,  =  0(n,),  (3.9.27)  may  not  be  a  good  limit.  Indeed, 
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Craig's  method  diverges  when  j,  =  n,  -  1.  We  note  that  in  the  multi-server  case,  *,  >  1,  we 
cannot  use  the  Fast  Fourier  Transform.  Hence  the  work  and  storage  requirement  per 
iteration  are  2nf  +  0{n,)  and  n}  +  0(ii,)  respectively. 

Let  us  compare  our  method  with  other  conventional  methods.  As  mentioned  in  9  2.4,  a 
typical  approach  to  this  kind  of  queueing  problems  is  to  fix  one  degree  of  freedom  in  the 
solution  p.  Or  equivalently,  one  row  in  the  generating  matrix  A  is  deleted  and  the  resulting 
nonsingular  system  solved.  Let  us  consider  the  case  where  this  nonsingular  system  is  solved 
by  a  classical  iterative  method  such  as  the  point  SOR  method,  see  Kaufman  [25].  Since  the 
graph  of  the  generating  matrix  A  is  the  same  as  the  graph  of  a  discrete  Lapladan,  it  is  dear 
that  the  point  SOR  method  requires  7  nf  +  0(n,)  work  and  nf  +  0(n,)  storage  spaces  per 
iteration.  We  note  that  this  method  converges  very  slowly.  Tables  8  and  9  give  the  numerical 
evidences.  Table  8  lists  the  number  of  iterations  required  by  the  two  methods  and  table  9 
compares  the  time  required  in  seconds.  In  the  tables,  w'  denotes  the  optimal  relaxation  factor 
up  to  three  decimal  points  obtained  experimentally.  We  sec  that  the  point  SOR  method  has  a 
very  slow  convergence  rate  espedally  when  *,  is  small.  In  the  three  cases  we  considered,  our 
method  converges  5  to  32  times  faster  than  the  point  SOR  method. 

Let  us  now  consider  the  approach  of  solving  the  nonsingular  system  by  a  direct  method. 
Since  the  band-width  of  the  generating  matrix  A  is  n,,  the  band  Gaussian  elimination  will 
require  0(nf)  work  and  0(nf)  storage  spaces.  A  direct  method  that  takes  advantage  of  the 
separablity  of  the  problem  will  reduce  these  counts  to  0(nf)  and  0(nf)  respectively,  see 
Kaufman  [25].  Since  the  graph  of  A  is  the  same  as  the  graph  of  a  discrete  Lapladan,  nested 
dissection  method  can  also  be  used,  see  George  and  Liu  [18].  The  counts  for  this  method  are 
0(nf)  and  0(nj  log  n,)  respectively.  Let  us  remark  that  our  preconditioned  system  B  y^  =  b 
can  also  be  solved  by  direa  methods.  In  faa,  we  can  compute  and  store  B  by  ufing  (2.3.17). 
This  would  require  0{nf)  operations  and  0(nj)  storage  spaces.  Thus  we  see  that  solving  the 
preconditioned  system  B  y^  =  b  by  conjugate  gradient  type  methods  requires  the  least 
amoimt  of  work  and  storage. 
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(II)         Other  Separable  Preconditioners 

Tables  10a  and  10b  report  on  the  performance  of  a  family  of  preconditioners.  The 
parameter  P  in  the  tables  indicates  the  preconditioners  we  are  using.  If  3  #  0,  the 
preconditioner  is  A^  which  is  nonsingular  and  is  given  by  (2.4.5).  If  3  =  0,  the 
preconditioner  is  defmed  by  (2.4.6).  When  3  =  Aq  the  preconditioner  is  Aq  itself.  Recall  that 
when  3  =  1,  the  preconditioner  resembles  a  Dirichlet  problem  while  when  P  =  0,  it 
resembles  a  Neumann  problem.  We  see  that  the  number  of  iterations  decreases  as  |  p  1  -  0. 
However,  we  remark  that  for  sufficiently  small  p,  arithmetic  overflow  will  occur.  This  is 
because  by  (2.4.4)  and  (2.4.5),  the  smallest  eigenvalue  of  Ag  tends  to  zero  as  |  P  |  -  0. 

(HI)        Numerical  Results  for  other  queueing  models 

Tables  11a  and  lib  give  the  number  of  iterations  required  for  convergence  for  the 
model  we  discussed  in  §  2.3.2,  i.e.,  the  2-queue  model  with  overflow  permitted  in  both 
directions.  We  see  that  our  method  does  not  converge  when  j,  =  n,  -  1.  However,  when  the 
J,  are  constant  and  independent  of  n,,  the  performance  is  quite  good.  The  number  of 
iterations  seems  to  increase  proportionally  to  log  n,  and  the  convergence  rate  is  almost 
independent  of  a. 

Tables  12  gives  the  results  of  our  method  when  applied  to  the  3-queue  model  discussed 
in  §  2.5.1.  The  number  of  iterations  also  increases  like  order  0(log  n,).  Table  13  shows  the 
time  required  in  each  phase  of  the  algorithm.  The  time  per  iteration  is  slightly  better  than  the 
theoretical  estimates  obtained  in  Appendix  A.3,  see  (A.3.16).  Tables  14  and  15  compare  our 
method  with  the  point  SOR  method.  We  see  that  our  method  performs  much  better  than  the 
point  SOR  method  especially  when  the  j^  are  small. 

Finally,  we  consider  the  model  discussed  in  9  2.6.  Tables  16a  and  16b  give  the  number 
of  iterations  required  for  convergence  using  the  method  introduced  in  9  2.6.1.  In  all  cases, 
we  see  that  the  number  of  iterations  increases  at  most  like  0(log  n,). 
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For  easy  reference,  in  the  foUowing  we  use  the  notation  Ma.b.c  to  denote  the  method  we 
developed  in  9  a.b.c.  For  the  definitions  of  P;,  ii/  and  a,  see  (3.1.3)  and  (3.9.27). 

Table  la.  M2.3.1  ( tolerance  =  10"^°  ) 


Craig's  Method: 

3/  Ar  ,  t^i  =  'I 

=  1,  P,  =  ^,  '■ 

=  1,2 

a 

a 

a 

(fl:,ni) 

("i,":) 

('»l,'»2) 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

(5.8) 

8 

8 

7 

7 

(8,8) 

8 

8 

7 

7 

(8,5) 

5 

5 

5 

5 

(10,16) 

13 

11 

9 

9 

(16,16) 

13 

10 

8 

8 

(16,10) 

10 

8 

8 

7 

(20,32) 

15 

11 

10 

10 

(32,32) 

15 

11 

10 

10 

(32,20) 

13 

11 

9 

9 

(40,64) 

22 

13 

12 

12 

(64,64) 

22 

13 

12 

12 

(64,40) 

15 

12 

11 

11 

(80,128) 

•  • 

15 

12 

12 

(128,128) 

•  • 

15 

12 

12 

(128,80) 

•  • 

13 

12 

12 

'•  more  than  30  iterations. 


Table  lb.  M2.3.1  (  tolerance  =  10" ^°  ) 


Orthodir  Method 

M-/ 

p,  Ar  .  ^^.,  =  s,=  l 

.  P,  =  ^, 

>=1,2 

a 

a 

a 

("i.":) 

("i,":) 

("i.":) 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

(5,8) 

7 

7 

7 

7 

(8,8) 

7 

7 

7 

7 

(8,5) 

4 

4 

4 

4 

(10,16) 

14 

15 

15 

15 

(16,16) 

14 

15 

15 

15 

(16,10) 

9 

9 

9 

9 

(20.32) 

15 

21 

21 

21 

(32,32) 

15 

21 

21 

21 

(32.20) 

15 

17 

17 

17 

(40,64) 

15 

24 

25 

25 

(64,64) 

15 

24 

25 

25 

(64,40) 

15 

22 

22 

22 

(80,128) 

15 

27 

27 

27 

(128,128) 

15 

27 

27 

27 

(128,80) 

15 

25 

25 

25 
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Table  2a.  M2.3.1  (  tolerance  =  10"^°  ) 


Craig's  Method:    —  =  1  -  ^,  /»,»  ,  ji,  =  j,  =  3,  =  1,  i=l,2 

("i,":) 

a 

("i,":) 

a 

("i.":) 

a 

1. 

2. 

3. 

1. 

2. 

3. 

1. 

2. 

3. 

(5,8) 
(10,16) 
(20,32) 
(40,64) 
(80,128) 

8 
11 
12 
14 
16 

5 

8 

10 

12 
12 

7 

8 

10 

12 

12 

(8,8) 

(16,16) 

(32,32) 

(64,64) 

(128,128) 

8 
11 
12 
14 
16 

7 

9 

10 

12 

12 

7 

9 

10 

12 

12 

(8,5) 
(16.10) 
(32,20) 
(64,40) 
(128,80) 

5 

9 

11 

13 

15 

5 

8 

9 

11 

12 

5 

7 

9 

11 

12 

Table  2b.  M2.3.1  (  tolerance  =  10"^°  ) 


Orthodir  Method: 

^/  hr  .  y.i  =  si 

=  P/  =  1, 

'•=1,2 

("i,":) 

a 

("i,":) 

a 

("i,":) 

a 

1. 

2. 

3. 

1. 

2. 

3. 

1. 

2. 

3. 

(5,8) 

7 

7 

7 

(8,8) 

7 

7 

7 

(8,5) 

4 

4 

4 

(10,16) 

15 

15 

15 

(16,16) 

15 

15 

15 

(16,10) 

9 

9 

9 

(20,32) 

20 

21 

21 

(32,32) 

21 

21 

21 

(32,20) 

17 

17 

17 

(40,64) 

24 

25 

25 

(64,64) 

24 

24 

24 

(64,40) 

22 

22 

22 

(80,128) 

26 

27 

27 

(128,128) 

27 

27 

27 

(128,80) 

26 

25 

25 
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Table  3«.  M2.3.1  ( tolerance  =  10" i°  ) 


Craig's  Method:    —  =  1  +  3,  A,»  ,  jt,  =  j,  =  i,  p,  =  ^4,  ,=i,2 

("i.":) 

a 

("i.":) 

a 

("i.":) 

a 

1. 

2. 

3. 

00 

1. 

2. 

3. 

» 

1. 

2. 

3. 

00 

(5,8) 

(10,16) 

(20,32) 

(40,64) 

(80,128) 

8 

11 
13 
13 
15 

7 

8 

10 

12 

12 

7 

9 

10 

12 

12 

6 

7 

9 

11 

11 

(8,8) 

(16,16) 

(32,32) 

(64,64) 

(128,128) 

8 
10 

11 
13 
15 

7 

3 

10 

12 
12 

7 

9 

10 

12 
12 

6 

7 

9 

11 

11 

(8,5) 

(16,10) 

(32,20) 

(64,40) 

(128,80) 

5 

8 
11 
12 
13 

5 

8 

9 

11 

12 

5 

7 

9 

11 

12 

4 
6 
8 
9 

11 

Table  3b.  M2.3.1  (  tolerance  =  10"^°  ) 


Orthodii  Method:    —  =  1  +  ^,  A,»  ,   ^,  =  ,,  =  1,  p,  =  %,  i  =  l,2 

("i.":) 

a 

("i,":) 

a 

("i.":) 

a 

1. 

2. 

3. 

00 

1. 

2. 

3. 

00 

1. 

2. 

3. 

00 

(5,8) 

(10,16) 

(20,32) 

(40,64) 

(80,128) 

7 
15 
21 
25 
27 

7 
15 
21 
25 
27 

7 
15 
21 
25 
27 

7 
15 
21 
25 
27 

(8,8) 

(16,16) 

(32,32) 

(64,64) 

(128,128) 

7 
15 
21 
24 
27 

7 
15 
21 
25 
27 

7 
15 
21 
25 
27 

7 
15 
21 
24 
27 

(8,5) 

(16.10) 

(32,20) 

(64,40) 

(128,80) 

4 

9 

17 

22 
25 

4 
9 

17 
22 
25 

4 

9 

17 

22 

25 

4 

9 

17 

22 

25 
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Table  4a.  Eigenvalues  8^  of  B  B'  when  X,  <  jjl,. 


P,  A"  ,  Si 

=  1.  3, 

=  ^,  «,  = 

=  32,  i=l,2 

^1 

1 

1 

1 

1 

4 

4 

^2 

1 

1 

4 

4 

1 

1 

a 

1 

2 

1 

2 

1 

2 

2 

2 

1.25 

1.25 

5 

5 

8i 

0.0594 

0.0528 

0.1984 

0.1860 

0.0240 

0.0210 

82 

0.9999 

1.0000 

0.9239 

0.9733 

1.0000 

1.0000 

83 

1.5160 

1.6042 

1.0001 

1.0000 

4.0748 

4.4442 

8* 

1.8872 

1.9798 

1.2137 

1.2424 

4.5056 

4.9689 

85 

1.9181 

1.9972 

1.2256 

1.2492 

4.6062 

4.9896 

86 

1.9389 

1.9981 

1.2322 

1.2495 

4.7035 

4.9907 

810 

1.9687 

1.9991 

1.2420 

1.2498 

4.8340 

4.9951 

8:3 

1.9807 

1.9994 

1.2456 

1.2499 

4.8897 

4.9968 

820 

1.9858 

1.9996 

1.2470 

1.2499 

4.9144 

4.9975 

826 

1.9883 

1.9996 

1.2477 

1.2499 

4.9271 

4.9978 

827 

1.9885 

1.9997 

1.2477 

1.2499 

4.9282 

4.9978 

828 

1.9887 

1.9997 

1.2478 

1.2499 

4.9290 

4.9978 

829 

1.9888 

1.9997 

1.2478 

1.2499 

4.9296 

4.9998 

830 

1.9889 

2.0236 

1.2478 

1.2567 

4.9300 

5.0961 

831 

5.0099 

5.6095 

2.0014 

2.1362 

17.5431 

20.8104 

832 

15.5322 

16.7475 

5.1336 

5.1793 

59.3351 

73.0058 
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Table  4b.  Eigenvalues  8,  of  B  B'  when  X,  a  ji,. 


1  +  P,  hr 

,  */  =  1. 

P/  =  ^, 

1,  =  32, 

i=l,2 

^1 

1 

1 

1 

1 

1 

4 

4 

^2 

1 

1 

1 

4 

4 

1 

1 

a 

00 

1 

2 

1 

2 

1 

2 

2 

2 

2 

1.25 

1.25 

5 

5 

8i 

0.0526 

0.0454 

0.0523 

0.1696 

0.1851 

0.0180 

0.0208 

82 

1.0000 

0.9999 

1.0000 

0.9996 

0.9758 

1.0000 

1.0000 

83 

1.6069 

1.6893 

1.6096 

1.0136 

1.0000 

4.7318 

4.4646 

84 

1.9820 

2.0110 

1.9841 

1.2522 

1.2438 

5.0699 

4.9856 

85 

1.9993 

2.0111 

2.0003 

1.2522 

1.2501 

5.0703 

5.0022 

86 

2.0000 

2.0112 

2.0003 

1.2522 

1.2501 

5.0709 

5.0022 

810 

2.0000 

2.0121 

2.0004 

1.2524 

1.2501 

5.0759 

5.0023 

815 

2.0000 

2.0149 

2.0004 

1.2531 

1.2501 

5.0900 

5.0027 

820 

2.0000 

2.0209 

2.0006 

1.2548 

1.2501 

5.1203 

5.0036 

826 

2.0000 

2.0427 

2.0013 

1.2613 

1.2503 

5.2297 

5.0070 

827 

2.0000 

2.0513 

2.0016 

1.2639 

1.2504 

5.2773 

5.0086 

828 

2.0000 

2.0644 

2.0020 

1.2679 

1.2505 

5.3408 

5.0115 

829 

2.0009 

2.0869 

2.0030 

1.2747 

1.2508 

5.4727 

5.0213 

830 

2.0260 

2.1308 

2.028.5 

1.2879 

1.2582 

5.7004 

5.1254 

831 

5.6294 

6.2678 

5.6493 

2.2865 

2.1452 

24.5105 

21.0307 

832 

16.7993 

18.9737 

16.8519 

5.3911 

5.1870 

95.9518 

74.1012 
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Table  5a.  Smallest  Eigenvalue  8^  of  B  B'  when  X,  ^  (i.,. 


8i  X  100 

=  1-P,A 

",*/=! 

&i  ~  ^f 

«-=l,2 

Xi 

1 

1 

1 

1 

1 

4 

4 

^2 

1 

1 

1 

4 

4 

1 

1 

a 

00 

1 

2 

1 

2 

1 

2 

8 

48.6207 

48.9467 

48.7007 

65.7717 

65.9378 

44.2088 

44.9619 

16 

17.7985 

19.1492 

17.8906 

39.4756 

38.4127 

11.3761 

10.5256 

"/ 

32 

5.2563 

5.9433 

5.2783 

19.8390 

18.5981 

2.4024 

2.1027 

64 

1.3788 

1.5999 

1.3822 

8.3698 

7.5914 

0.4671 

0.3983 

Table  5b.  Largest  Eigenvalue  8„  of  B  B'  when  X,  s  ji,,. 


8. 

=  1  -  P,  A, 

"  .  */  = 

I,  P,  =  ^ 

.  '  =  1,2 

^i 

1 

1 

1 

1 

1 

4 

4 

^2 

1 

1 

1 

4 

4 

1 

1 

a 

00 

1 

2 

1 

2 

1 

2 

8 

5.1474 

4.7778 

5.0950 

2.0909 

2.1230 

15.8967 

19.0733 

16 

9.0321 

8.3977 

8.9830 

3.1472 

3.1792 

30.3597 

36.9295 

"( 

32 

16.7993 

15.5323 

16.7475 

5.1336 

5.1793 

59.3352 

73.0058 

64 

32.4994 

29.7953 

32.4431 

8.9788 

9.0954 

117.5722 

145.8702 
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Table  6a.  M2.3.1  ( tolerance  =  10"^°,  time  in  seconds  ) 


Craig's  Method:    —  =  1  -  p,  A,"  ,  ji/  =  •»/  =  1.  P/  =  ^.  '  =  1.2,  a  =  2 

"( 

8 

16 

32 

64 

128 

InitialiTiation 

0.005 

0.010 

0.021 

0.044 

0.084 

Iteration 

0.097 

0.201 

0.473 

1.066 

2.283 

No.  of  iteration 

7 

9 

10 

12 

12 

Time  per  iteration 

0.0139 

0.0223 

0.0473 

0.0888 

0.1903 

Generating  p 

0.036 

0.123 

0.451 

1.761 

7.398 

Total  time 

0.138 

0.334 

0.945 

2.871 

9.765 

Table  6b.  M2.3.1  ( tolerance  =  10"^°,  time  in  seconds  ) 


Orthodir  Method:    —  =  1  -  p,  A/*  ,   jt,  =  i,  =  1,  p,  =  %  i=l,2,  a  =  2 

«/ 

8 

16 

32 

64 

128 

Initialization 

0.007 

0.012 

0.023 

0.045 

0.093 

Iteration 

0.088 

0.329 

0.997 

2.447 

5.516 

No.  of  iteration 

7 

15 

21 

25 

27 

Time  per  iteration 

0.0126 

0.0219 

0.0475 

0.0979 

0.2043 

Generating  p 

0.035 

0.118 

0.446 

1.797 

7.363 

Total  time 

0.130 

0.459 

1.466 

4.289 

12.972 
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Table  7«.  M2.3.1  (  tolerance  =  10"^°  ) 


Craig's  Method:  j,  p.,  =  X,  +  i],  hf   ,  X,  =  ii,  =  1,  i=l,2 

'/ 

1 

5 

n,-l 

a 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

(8,8) 

(16,16) 

(32,32) 

(64,64) 

(128,128) 

8 
13 

15 

22 

•  • 

8 
11 
12 
14 
16 

7 

9 

10 

12 

12 

7 

9 

10 

12 

12 

8 

13 
14 

18 

•  • 

8 
12 
14 
16 
17 

8 

11 
13 
14 
15 

8 

11 
13 
14 
15 

8 

13 
16 
16 
18 

8 

14 

16 

•  • 

•  • 

8 

14 

16 

•  • 

•  • 

8 
14 

16 

•  • 

•  • 

more  than  30  iterations 


Table  7b.  M2.3.1  ( tolerance  =  10"^°  ) 


Orthodir  Method:  s 

M-/  =  X(  +  "ni  hf   ,  X,  =  71,  =  1,  1  =  1,2 

'i 

1 

1        ' 

n,-l 

a 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

0. 

1. 

2. 

3. 

(8,8) 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

(16,16) 

14 

15 

15 

15 

14 

15 

15 

15 

14 

13 

13 

13 

(32,32) 

15 

21 

21 

21 

16 

21 

21 

21 

16 

16 

16 

16 

(64,64) 

15 

24 

25 

24 

16 

25 

25 

25 

16 

18 

18 

18 

(128,128) 

15 

27 

27 

27 

16 

27 

27 

27 

16 

20 

20 

20 
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Table  8.  M2.3.1  (  tolerance  =  10"^°  ) 


*/  M-j  =  ^f  +  ^/  *°   »   ^(  =  ^/  =  1.   '  =  1.2,   a  =  2 

Method 

point  SOR:  Initial  guess  p,y  "  0 

Orthodir 

"I 

'/ 

A/ 

« 
(1) 

Relaxation  factor  w 

Iterations 

1.0 

1.3 

1.5 

1.6 

1.7 

1.8 

1.9 

• 

Cl) 

4 
4 

1 
3 

16 
16 

1.600 
1.452 

322 
139 

170 
68 

95 
37 

52 
49 

76 
67 

133 
109 

471 
280 

52 

32 

3 

3 

8 
8 

1 

7 

64 
64 

1.794 
1.619 

•  • 
357 

939 
190 

583 
108 

434 
64 

295 
68 

153 
108 

932 

274 

151 

52 

7 
7 

more  than  1000  iterations 


Table  9.  M2.3.1  (  tolerance  =  10~^°,  time  in  seconds  ) 


Parameter 

*,  11,  =  X,  +  -n,  hf   ,   X,  =  J,  =  1,   i=l,2,   a  =  2 

Problem 

n,  =  16,  J/  =  1 

n,  =  16,  s,  =  15 

n,  =  40,  J,  =  39 

Dimension  N 

256 

256 

1600 

Method 

Orthodir 

pt  SOR 

Orthodir 

pt  SOR 

Orthodir 

pt  SOR 

(1) 

... 

1.891 

... 

1.734 

... 

1.833 

No.  of  iterations 

15 

403 

13 

78 

16 

130 

Tmie  for  iteration 

0.331 

14.894 

0.353 

2.967 

2.138 

30.596 

Time  per  iteration 

0.0220 

0.0370 

0.0272 

0.0380 

0.134 

0.235 

Total  time 

0.463 

14.916 

0.681 

2.989 

6.311 

30.692 

107 


Table  10a.  M2.4  ( tolerance  =  10"^° ) 


Craig's  Method: 

1  -  p,  A," 

M./  =  ■»/  =  1 

,  &t  =  % 

«=1,2, 

o  =  2 

P 

("i.":) 

1.00 

.75 

.50 

.25 

.10 

.01 

.001 

0 

Ao 

-.01 

-.25 

-.75 

(8.8) 

8 

8 

8 

8 

8 

8 

9 

10 

7 

8 

9 

9 

(16,16) 

16 

16 

15 

13 

12 

9 

9 

12 

8 

9 

16 

22 

(32,32) 

28 

26 

23 

20 

15 

10 

10 

14 

10 

10 

23 

39 

(64,64) 

48 

43 

37 

28 

20 

12 

11 

17 

12 

12 

38 

•  • 

more  than  50  iterations 


Table  10b.  M2.4  (  tolerance  =  10" i°  ) 


Orthodir  Method: 

=  1  -  ^/ hr 

.   M-f  =  •»/  = 

1,   P,  =  % 

«  =  1,2 

,   a  = 

2 

P 

("i.":) 

1.00 

.75 

.50 

.25 

.10 

.01 

.001 

0 

Ao 

-.01 

-.25 

-.75 

(8,8) 

7 

8 

8 

8 

8 

8 

•  • 

9 

7 

8 

8 

8 

(16,16) 

15 

15 

15 

16 

16 

16 

•  • 

17 

15 

16 

16 

16 

(32,32) 

24 

23 

22 

21 

21 

21 

•  • 

26 

21 

21 

23 

28 

(64,64) 

36 

33 

31 

27 

25 

24 

•  • 

30 

25 

25 

32 

44 

more  than  50  iterations 


\ 
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Table  11a.  M2.3.2  ( tolerance  =  10-^°  ) 


Orthodir  Method:  j 

,  ji,  =  X,  +  Tl,  A,"  , 

X,  =  Ti,  =  1,  1  =  1,2,  a  =  1 

*f 

*/ 

*/ 

(ii.ni) 

(ii.":) 

("i.":) 

1 

4 

n,-l 

1 

4 

n,-l 

1 

4 

«/-l 

(5,8) 

11 

11 

10 

(8,8) 

7 

7 

7 

(8.5) 

11 

11 

10 

(10.16) 

18 

18 

15 

(16,16) 

15 

15 

13 

(16,10) 

18 

18 

15 

(20,32) 

23 

23 

17 

(32,32) 

22 

22 

•  • 

(32.20) 

23 

23 

17 

(40,64) 

26 

26 

•  • 

(64,64) 

26 

27 

•  • 

(64,40) 

26 

26 

•  • 

more  than  50  iterations 


Table  lib.  M2.3.2  (  tolerance  =  10"  i°  ) 


Orthodir  Method:  s 

,  >i,  =  X,  +  Ti,  Af  ,  X,  =  Ti,  =  1,  «  =  1,2,  a  =  2 

'I 

'I 

'i 

("d"!) 

("i'":) 

("i-":) 

1 

4 

n,-l 

1 

4 

ni-l 

1 

4 

n,-l 

(5,8) 

11 

11 

10 

(8,8) 

7 

7 

7 

(8,5) 

11 

11 

10 

(10,16) 

18 

17 

14 

(16,16) 

15 

15 

13 

(16,10) 

18 

17 

14 

(20,32) 

23 

23 

17 

(32,32) 

21 

21 

•  • 

(32,20) 

23 

23 

17 

(40,64) 

26 

26 

•  • 

(64,64) 

26 

26 

•  • 

(64.40) 

26 

26 

•  a 

'•  more  than  50  iterations 
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Table  12.  M2.5.1  ( tolerance  =  10"^ ) 


Orthodir  Method:    s,  p.,  =  X,  +  -n,  A,"  ,   X,  =  ti,  =  1,   i=  1,2,3 

a 

a 

a 

(ni,n2,n{) 

N 

'1 

*/ 

*/ 

1. 

2. 

3. 

1. 

2. 

3. 

1. 

2. 

3. 

(4,4,4) 

64 

1 

10 

10 

10 

3 

9 

9 

9 

3 

9 

9 

9 

(8,8,8) 

512 

1 

14 

14 

14 

3 

14 

14 

14 

6 

13 

13 

13 

(16,16,16) 

4096 

1 

18 

18 

18 

3 

18 

18 

18 

9 

17 

17 

17 

Table  13.  M2.5.1  ( tolerance  =  10"^,  time  in  seconds  ) 


Orthodir  Method:    *,  p.,  =  X,  +  i],  A,"   ,  X,  =  ti,  =  1,  j,  =  3,  j  =  1,2,3,  a  =  2 

«t 

4 

8 

16 

Initialization 

0.015 

0.065 

0.308 

Iteration 

0.362 

3.492 

28.407 

No.  of  iteration 

9 

14 

18 

Tune  per  iteration 

0.0402 

0.249 

1.578 

Generating  p 

0.039 

0.407 

4.526 

Total  rime 

0.416 

3.964 

33.241 
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Table  14.  M2.5.1  (  tolerance  =  10"^  ) 


s,  jjt,  =  X,  +  T],  hf   ,   X,  =  Ti,  =  1,   1  =  1,2,3,   a  =  2 

Method 

point  SOR:  Initial  guess  pij  ■  0 

Orthodir 

«/ 

ii 

N 

* 

Relaxation  factor  u 

Iteratiom 

1.0 

1.4 

1.5 

1.6 

1.7 

1.8 

1.9 

• 

(1) 

4 
4 

1 
3 

64 
64 

1.700 
1.593 

474 
183 

206 

76 

159 

54 

115 
34 

69 

53 

190 
104 

•  a 

•  a 

69 
30 

10 
9 

8 
8 

1 
7 

512 
512 

1.831 
1.715 

•  • 
458 

•  • 
199 

907 
154 

660 
111 

489 
64 

300 

77 

•  • 

305 

242 
49 

14 
12 

more  than  1000  iterations. 


Table  15.  M2.5.1  ( tolerance  =  10"*,  time  in  seconds  ) 


Parameter 

J,  p.,  =  X,  +  -n,  hf  ,  X,  =  -n,  =  1,  j  =  1,2,3,  a  =  2 

Problem 

n,  =  4,  J,  =  3 

n,  =  8,  *,  =  7 

n,  =  8,  5(  =  1 

Dimension  N 

64 

512 

512 

Method 

Orthodir 

pt  SOR 

Orthodir 

ptSOR 

Orthodir 

ptSOR 

• 
(1) 

... 

1.593 

... 

1.715 

... 

1.831 

No.  of  iterations 

9 

30 

12 

49 

14 

242 

Time  for  iteration 

0.364 

0.498 

2.822 

5.936 

3.365 

31.225 

Time  per  iteration 

0.0405 

0.0166 

0.2352 

0.1211 

0.2404 

0.1290 

Total  time 

0.420 

0.529 

3.274 

5.997 

3.815 

31.282 

Ill 


Table  16a.  M2.6.1  ( tolerance  =  10"* ) 


Orthodir  Method:    s,  ^^., 

=  \,  +  -r\ 

/  ''(*  >  ^(  ~ 

T1,  =    1,     1  =  1,2 

("i.":) 

a=  1. 

a 

=  2. 

a  =  3. 

(*1.*2) 

Iterations 

i'i,'2) 

Iterations 

(*l,-»2) 

Iterations 

(10,10) 

(2,2) 

11 

(5,5) 

10 

(4.4) 

11 

(20,20) 

(4,4) 

15 

(5,5) 

15 

(8,8) 

13 

(40,40) 

(8,8) 

17 

(5.5) 

18 

(16,16) 

16 

(80,80) 

(16,16) 

20 

(5.5) 

22 

(32,32) 

18 

Table  16b.  M2.6.1  ( tolerance  =  10"'  ) 


Orthodir  Method:    j,  pi,  =  X,  +  ti,  hf   ,   X,  =  t|,  =  1,  »  =  1.2 

(^i.^:) 

a  =  3. 

a  =  2. 

a  =  2. 

("i.":) 

Iterations 

("i.":) 

Iterations 

("i.":) 

Iterations 

(10,10) 
(10,10) 
(10,10) 
(10,10) 

(17,18) 
(25,26) 
(41,42) 
(73,74) 

12 
14 
17 
20 

(27,28) 
(35,36) 
(51,52) 
(83,84) 

15 
16 
18 
21 

(15,15) 

(30,30) 

(60,60) 

(120,120) 

11 
16 
19 
23 
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Appendix 


A.l         Altercativea  (B)  &  (C) 

For  the  models  discussed  in  9  2.3.1,  there  are  two  other  alternatives  to  compute  the 
quantity  E  Aq  E  y.  They  both  have  advantages  over  Alternative  A.  The  choice  of  the  best 
alternative  depends  on  the  problem  itself. 

Alternative  (B):-  Partial  diagonalization  cf  A^  by  (5^  d  ®  /) 

Recall  that  by  (2.1.5)  and  (2.1.12),  we  have 

T  -  (Gl  S:'  ®  /)  Ao  (5i  Qi  ®  /)  =  (Fi  ®  /  +  /  ®  G2). 

Here  T  =  ^ag{T^ T„),  and  each  block  Tj  =  y^j  I„^  +  Gj  is  a  tridiagonal  matrix. 

Since  71 .  =  0,  r„  =  Ci  is  singular.  Since  Aq  has  only  a  one  dimensional  nuil-space,  all  the 
other  Tj,  l-sj<n^,  are  non-singular.  By  (2.1.12)  and  (2.1.15),  the  generalized  inverse  of  T„^ 
is  given  by 

T:^  =  G^  =  S2  Q2  r^  Ql  S{K  (A.1.1) 

Since  each  T,,  l^jKn^,  is  nonsingular,  the  generalized  inverse  of  T  is  given  by 

r*  =  diag  (rr^ r-ii ,  s^  O2  n  Ql  S{'). 

Thus  the  generalized  inverse  of  iAg  is  given  by 

K  =  (5iGi®/)r^(C?l5r^®/). 
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By  (2.!^.  11),  a  straightforward  computation  shows  that  for  any  vector  y  i  R  ^, 
E'  A^Ey=  2  {^q„^jf  Tf' y  +  i\^)'  S,  Q,  F^  Ql  S{^  y. 


(A.1.2) 


Since  for  each  l^jKnj,  Tj  is  tridiagonal,  computing  Zj  ■  Tf^y  requires  Srtj  operations.  Thus 
the  first  term  can  be  evaluated  in  6(n^—l)n2  operations,  provided  that  {'■q„  j)^  are  computed 

and  stored  before  the  iteration.    We  claim  that  the  second  term  can  be  computed  in  0(n2) 
rather  than  0(nl)  operations.  We  first  prove 

LEMMA  A.1.1 

/?"'  =  {Sl  I2}  ©  ImiGi),  and  the  projection  Py  of  any  >  €  /?"'  onto  ImiGj)  is  given  by 


Py  =  y-illy)Sll2. 


(A.1.3) 


Proof:  The  first  statement  follows  directly  from  the  spectral  decomposition  of  G2  in 


(2.1.12).  To  prove  (A.1.3),  we  first  write  Cj  ^2^  y  = 


,  where  3  is  a  scalar.  By  (2.1.12) 


and  the  fact  that  72^  =  O1  S2  G: 
length  Tij-l.  Thus  by  (2.1.10), 


is  the  null- vector  of  G,,  where  0  is  the  zero  vector  of 


S2Q 


2  W2 


=  5?  1 


2  *2- 


(A.1.4) 


This  gives  [OM]  =  l^  5:  Q2.  Thus  0  =  [OM] 


=  I2*  ^2  C2  Ql  S{'y  =  11  y.  Hence 


Py  =  S2Q 


2  V2 


=  S2Q 


2  W2 


-&S2Q2 


=  y-ii'2y)sli2.   o 


We  note  that  Py  can  be  computed  in  2«2  operations  if  5^  Ij  is  stored  beforehand.  By 
(A.1.3),  (A.1.4)  and  (2.1.15), 
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the  second  term  in  (A.  1.2)  becomes 

S2  Qi  r^  Ql  S{'y  =  S2  Q2  rt  Ql  S2'  (Py  +  (I2  y)sl  I2) 

=  52  G:  r^*  Ql  S{'  Py  +  y  (l^  y)  Sl  I2,  (A.1.5) 

where  y  is  the  arbitrary  fixed  constant  given  in  (2.1.15).  The  second  term  in  (A.1.5)  can  be 
obtained  in  n2  operations  if  -y  #  0.  Notice  that,  by  the  definition  of  P,  the  first  term 
z  m  S2Q2  ^2  Ql  ^2^  ^  y  '*  *^  unique  element  in  Im{G^  such  that  G2Z  =  P  y.  Moreover 
by  (2.1.10),  for  all  scalar  ti,  02(2  +  11  Sl  Ij)  =  P  y.  Since  Sl  Ij  is  a  positive  vector,  there 
exists  an  -qo  such  that  the  first  entry  of  f  -  (r  +  -no  Sl  I2)  is  1.  Using  the  fact  that  Gj  is 
tridiagonal,  the  remaining  entries  of  f  can  be  solved  by  forward  substitution.  This  requires 
3n2  operations.  Since  z  €  Im{G-^  is  the  projection  of  f  onto  Im(G2),  by  lemma  A.1.1, 
z  =  f  -  (Ij  f)  sl  I:-  This  can  be  computed  in  another  2n2  operations.  Thus  the  second  term 
in  (A.  1.2)  can  be  computed  in  at  most  8n2  operations. 

Hence  we  see  that  E'  Aq  E  y  can  be  computed  in  about  6nin2  operations.  Using  the  fact 
that  Tj  has  constant  subdiagonal  entries,  i.e.,  (Tj),j.i  —  -X2,  we  can  reduce  the  operations 

count  to  4n^n2.  In  fact,  instead  of  solving  Tf^  y  in  (A.1.2),  we  can  solve  wy  ■  (  t—  Tf^ )  y. 

The  subdiagonal  entries  of  this  matrix  are  all  -1,  so  that  wj  can  be  computed  in  3/i2  operations 
rather  than  the  usual  5/12  operations.  Notice  that  (A.1.2)  becom.es 

E'A^Ey=  "2  (>^2  \/)  ^j  +  (\^y  U-^y  (I2  y)sl  12]- 

y-1 
With  (X2  ^q„  /)  computed  and  stored  beforehand,  this  computations  requires  only  4nji2 
operations.  U  n^«  n2,  this  alternative  requires  much  less  work  compared  to  Alternative  A. 
Notice  also  that  it  is  not  necessary  to  generate  and  store  {G2>^2}-  Recall  that  in  alternative  A, 
we  need  to  store  both  Q^  and  ^2-  Thus  this  alternative  saves  half  the  storage  spaces 
compared  to  Alternative  A.  Moreover,  when  J;  =  1,  only  0(/i,)  storage  spaces  are  required. 

The  following  alternative,  on  the  other  hand,  does  not  require  the  eigenpair  {Ci ,  FJ. 
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Alternative  (C):-  Partial  diagonalization  of  A^  by  {I  ®  S2  Q2) 

RecaU  that  by  (2.1.5)  and  (2.1.12),  we  have 

(/  ®  Ql  S{')  Aq  (/  ®  5:  C:)  =  (Gi  ®  /  +  /  ®  Tj). 

Let  P  be  the  permutation  matrix  of  order  N  =  tiin^  such  that 

P\U  ®V)P  =  V®U, 
for  any  U  of  order  n^  and  V  of  order  nj.  By  (2.1.5)  and  (2.1.12),  we  have 

^'P'  (I®  Ql  Sj^)  Ao  (/  ®  5:  Q2)  ^  =  (^2  ®  /  +  /  ®  GJ. 
Here  ^  =  diag  (^^ ,  .  .  .  ,  ^„ ),  and  each  block  ^y  =  G^  +  iij-In  »  a  tridiagonal  matrix. 
Since  72^  =  0.  ^n  -  ^i  •*  singular.  All  the  other  ^y,  1  s  y  <  nji  ^^  nonsingiilar  as  Aq  has 
only  one  dimensional  null-space.  Similar  to  (A.1.1),  we  have 

where 

rr  =  diag  iill -yf^^-i .  l), 

with  7  defined  arbitrarily.  The  generalized  inverses  of  ^  and  Ag  are 

^^  =  diag  (^IT' ^„','-i.  ^;;.  (A.1.6) 

and 

A^  =  (/  ®  52  G2)  ''  ^^  P'  (l  ®  C:  ^2"')  (A.1.7) 

respectively.  Hence 

£*  Ao"-  £  =  £  (/  ®  ^2  G2)  ^  "^^  ^*  (^  ®  G:  ^2"^)  ^ 
=  S2  O2  £'  P'ir^P'  E  Ql  S{K 

A  straightforward  computation  gives 

E'  p^*  p-  E'Cl^  diag  ((Oi <o„p, 

where 

«y  =   1^;  ^j-'  \  =   1^;  (Gi  +  727  '"P"'  ''n..  1^><«2'  (^1-8) 
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and 


w. 


=  'e:  ^::  \ 


n.       fl-      n. 


"1       "a      "I 

Thus 

Compare  this  with  (2.3.15),  we  sec  that  ft  =  <!>.   By  lemma  2.3.2,  ta„^  can  also  be  defined 
arbitrarily. 

Thus,  before  we  start  our  iteration,  we  generate  and  store  {QiXi)-  ^^  ^^^  solve  and 
save  (i)y  according  to  (A.  1.8).  Since  (G^  +  ijj  0  '*  ^  tridiagonal  matrix,  and  only  the  last 
entry  of  the  solution  is  required,  wy  can  be  generated  in  3  n|  operations  for  eadi  j.  With  ft 
stored  before  the  iteration,  the  computation  of  E'  Aq  E  y  can  be  done  in  2  n^  +  0(n2) 
operations.  We  remark  that  there  is  no  need  to  generate  the  eigenpair  {Qi,rj.  Thus  it  saves 
half  the  storage  spaces  compared  with  Alternative  A.  Qearly  this  will  be  the  best  choice 
when  Hj  «  n^.  If  *2  =  li  this  algorithm  needs  only  ©(nj)  work  and  storage  per  iteration. 
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A.2         An  Algorithm  for  the  Model  In  i  2.3.2 

In  this  section,  we  give  an  algorithm  for  computing  the  product  E'  R  Aq  E  y  given  in 
(2.3.36).  This  algorithm  requires  about  6n^  operations  and  n^  memory.  In  the  single  server 
case,  they  can  be  reduced  to  5n^  and  0(n)  respectively.  The  idea  is  to  diagonalize  Aq 
partially  by   (/ 0  5:  d)   ^  ^  Appendix  A.1,   and  accumulate  the  result  bloclcwise  as 
mentioned  in  9  2.3.1. 

By  (2.3.33),  we  have,  for  any  >-  C  /?", 

Ey  =  ^^\+\®'y,  (A.2.1) 

where  'y  €  R"'  such  that  2('>)y~^-   ^^^  ^®  "*®  (Oy  ^°  denote  the  y-th  entry  of  the  vector. 

Since  (Ey)^-  =  (^)„  +  Cy)„ ,  we  can  always  fix  one  of  ('y)„  arbitrarily.  Let  us  set 
(V)„  =  0.  By  (A.  1.7),  we  have 

E'  RA;  Ey  =  E'  R(I®  S2  Q2)  P  ^*  P'  (/  ®  Q'l  S^^)  E  y.  (A.2.2) 

Let  us  divide  this  computation  into  three  steps: 

Step  I.  Computation  of  u  "  P'  {I  ®  Q'j  Si^)  E  y . 

We  first  note  that  by  (A.2.1)  and  the  definition  of  P, 

«-/>•(/  ®  Ql  S^')  Ey^Ql  S{'  ■  \  ®  V  +  C2  Si'  ■  -y  ®  \.         (A.2.3) 

Thus  if  Ql  S{'  •  ^e„  is  generated  and  stored  before  the  iteration,  we  only  need  to  compute 
Ql  S2^  •  ^y  here.  This  requires  only  n\  +  nj  operations.  It  also  need  n\  memory  spaces  to 
hold  Q2  and  two  linear  arrays  for  Ql  S{'  ■  ^e„  and  QIS2'  •  ^y.  When  ^2  =  1.  this  step 
requires  n2logn2  +  12  operations  and  0^/12)  storage. 

Step  II.  Computation  ofv<^'^'u. 
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Let  us  partition  u  in  (A.2.3)  as  u  =  [  u^ .....  u„^],  where  each  block  u,  is  of  length 
n,.  Notice  that  there  is  no  need  to  store  the  whole  vector  u  which  is  of  length  n^rij.  With 
Q*  5-1  •  ^e„  and  QlS^^  •  ^y  stored,  we  can  generate  u,  one  at  a  time  by  (A.2.3).  Let  us 
define  v  ■  ^^  u,  and  partition  v  as  in  «.  Then  by  (A-1.6),  we  have, 


^y  = 


J.  ■-  (A.2.4) 


Since  ^,,  1  ^  y  <  n^,  are  tridiagonal,  each  vj  can  be  computed  in  Sn^  operations.  Using  the 
fact  that  the  ^1  have  constant  subdiagonals,  this  can  be  reduced  to  3n^  operations.  v„  can  be 

computed  by  using  the  method  mentioned  in  8  A.1,  Alternative  B,  i.e.,  we  fix  the  first  entry 
of  v„    and  solve  for  the  other  entries  by  forward  substitution.  This  requires  only  Sn^ 

operations.  Thus  the  whole  step  can  be  done  in  approximately  Sn^nj  operations. 

Step  III.  Computation  cf  E'  R  (I  ^  S2  Qj)  P  v. 

Let  us  write  w  =  [w^ .....  w„  ]  ■  /*  v,  where  each  block  w,  is  a  vector  of  length  rij. 
Notice  that  (>v,)y  =  (v^),.  Thus  no  computation  is  required  for  w.  Notice  also  that 

(/  ®  52  Q2)w  =  [  52  C:  H-i 52  G2  ^n)- 

Since  this  product  will  be  multiplied  by  E'  R,  we  only  need  to  compute  the  projection  of 
(/  ®  52  Qi)  w  onto  Im(i?).  In  fact,  if  we  let 

t'liS,  C2  >^i)n, (^2  C2  ^.r^".  '  °  1  ^*'''' 

we  then  have 

E'RiI®S2Q2)w  =  E-R[t^\+\®SjQ2w,).  (A.2.5) 

Thus  we  only  have  to  compute  (52  Qj  >♦'()„  ,  1  ^  1  <  n^,  and  52  C2  **'«  •  Notice  that 

(^2  Ql  ^i)n,  =  S  C<l.,j  'dj)i^,)j  =  2  Cq„^j  'dj)  (v;),.     1  *  «'  <  n^,  (A.2.6) 
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and 

(5:  G:  w  ),  =  2  Cltj  ^)i^n)j  =  2  Qrij  ^)  (vy)„,.    1  ^  '  <  "2-  (A.2.7) 

Thus  it  is  dear  that  if  (^q„  ,  ^dj)  is  computed  and  stored  before  the  iteration,  this  step 
requires  only  2n\  operations.  Moreover,  there  is  no  need  to  generate  and  store  the  whole 
vector  V  which  is  of  length  n^n^,  in  step  n  before  we  start  computing  t.  In  fact,  we  can 
generate  the  vj  by  (A.2.4)  and  accumulate  their  contributions  to  r  and  (^j  Q2^n  )•  o°6  at  a 
time,  according  to  (A.2.6)  and  (A.2.7). 

Notice  that  since  R  is  sparse,  the  matrix- vector  product  of  R  with  the  vector  in  (A- 2. 5) 
can  be  computed  in  0{ni)  operations.  Thus  combining  steps  I,  n  and  m,  we  conclude  that 
this  algorithm  requires  only  S/if  +  3n;/i2  operations  and  n\  storage  spaces  for  Q^  and  some 
storage  spaces  for  vectors  of  length  n^  or  fij.  When  jj  =  1,  there  is  no  need  to  hold  Q^,  so 
the  memory  requirement  is  then  linear  in  n,.  When  only  j^  =  1,  we  notice  that  this  problem 
is  symmetric  with  respect  to  the  queues.  We  can  repeat  the  argimient,  diagonalizing  A^ 
partially  by  (/  ®  Cj  Sf  ^)  rather  than  by  (/  ®  Q\  5f  ^).  This  will  give  us  the  same  count. 
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A.3         An  Algorithm  for  the  Model  in  i  2.5.1 

We  will  give  an  algorithm  that  computes  RqAq  ^'m  (2.5.6)  in  0(nf)  work  and  0(n}) 
storage  spaces.  The  idea  is  exactly  the  same  as  in  9  A.2.,  i.e.,  we  diagonalize  Aq  partially 
and  accumulate  the  results  blockwise.  To  begin  with,  let  us  define  the  permutation  matrix  P 
as  the  matrix  of  dimension  A^  ■  n^  n2  n^,  such  that 

P  (U  ®  V  ®  W)  P'  =  V  ®  W  ®  U, 
for  any  U,  V  and  W  of  order  n^,  n2  and  n^  respectively.  Moreover,  let 

r  -  (/  ®  /  ®  53  Cj)  (/  ®  S2  G:  ®  /)/»  =  (/  ®  S2  G:  ®  ^3  C3)  P- 
Using  (2.1.22),  we  have, 

^i'  -  r-^.io  r  =  (/  ®  Fj  ®  /)  +  ( r^  ®  /  ®  /)  +  (/  ®  /  ®  cj. 

Here 

^I'  =  diag  (>!':.; ^i^^ ,  ^2.1 *n,^;.  (A.3.1) 

with  tridiagonal  matrices 

"^jj.  =  "T:/  +  yyj^  +  Gi.  (A.3.2) 

We  remark  that  only  ^„  ^  =  G^  is  singxilar  since  Aq  has  only  a  one  dimensional  null-space. 
Similar  to  fi  A. 2,  we  define  the  generalized  inverse  of  ^  as 

^*  =  inj ^r,i,,-i  .  *;^3).  (A.3.3) 

where 

^n*^,  =  5iC2,rrGi5r^  (A.3.4) 

Therefore 

We  divide  the  calculation  of  the  cost  into  three  steps.  The  first  step  calculates  the  cost  of 
computing  r  ■  T""^  5,  the  second  step  the  cost  of  computing  u  ^  '^'*'  t  and  the  final  step  the 
cost  of  computing  RqT  u. 
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STEP  I.  Computation  cft^T'^^. 

For  simplicity,  we  use  h  to  denote  a  vector  of  length  nyiy  partitioned  into  n^  blocks. 
More  precisely, 

^x-[^xi ix„y 

where  each  •'x^,  Isjfcsn,  is  a  nj-vectors.  We  also  define  for  any  n2-vector  e  and  any  njiy 
vector  ^x,  the  product  ©  as, 

*  ©  ^x  -  [«  ®  ^xi e®  \y. 

This  is  an  ^-vector,  with  each  «  ®  ^x^  an  njnj-vector.    By  (2.5.3),  each  x  i  lm{R^  can  be 

written  as 

X  =  \^  ©  ^x  +  'e„^  ®  ^x, 
where  •'^^  are  the  Jt-th  unit  vector  in  R"'.   Since  the  last  block  of  x  is  the  sum  of  ^x„  and  ^x„  , 

one  of  these  vectors  can  be  set  to  zero.  Without  loss  of  generality,  let  us  set  ^x„  =  0  for  all 

X  €  /m(J?o)-  In  particular,  since  ?  €  /m(/?o), 

where  ^5„  =  0.  Notice  that 


%  =  ^e„^®'k^'e„®H, 


■i 


y  -  (/  ®  /  ®  g;  53-^)  5  =  2^„^  ©  V  +  \  ®  ^y, 
where 

^^v*  =  G3  53-'  (^W,  l=s*sny,  7  =  1,2. 

This  computation  requires  (n^  +  n{)n\  operations.  Next  we  observe  that 

2  -  (/  ®  Q2  5f^  ®  /)  y  =  (G:  5f  ^  2^„;  ©  b'  +  ^^„^  ®  '^,  (A.3.5) 

where 

'z  =  (G2  5r^®/)e>'). 

Therefore 
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where  ^q^j  are  *«  (*j)  entry  of  Qi,  and  ^d^  are  given  in  (2.1.7).  Thus  the  second  term  in 
(A.3.5)  can  be  computed  in  n^/13  operations.  If  G2  -^r^  ^«n  i*  computed  and  stored  before  we 
iterate,  the  first  term  in  (A.3.5)  requires  A^  operations.  Thus  z  can  be  generated  in  A^  +  n^nj 
operations. 

Next,  let  t  "  P'z.  This  can  be  done  by  indexing  and  requires  no  extra  cost.  In  fact,  if 
we  partition  t  into  »i2''3  blocks  according  to  the  partition  of  ^  in  (A.3.1),  and  z  into  n^n2 
blocks  as  ?  -  [2^^ .....  Zi^^ .....  ^n^^^*.  where  each  z^j  is  an  nj-vector;  then 

(tjj)k  =  (ZiA«  l^*^"! .  l^j^nj  ,  Is/Sflj.  (A.3.6) 

Here  we  have  used  (  •  )y  to  denote  the  ;-th  entry.  Thus  the  cost  for  computing  t  =  T'^i'ti 

("1  ■•■  "i)"]  +  N  +  n^Hj    operations.  (A-3.7) 

STEP  11.  Computation  qfu^'^^t. 

Next  let  us  calculate  the  cost  of  computing  u  -  ^I'*  r  =  ^+  T~^  g.  From  (A-3.1)  and 
(A.3.2),  we  have  to  solve  for  Ujj^  in 

'^jjc  »j^  =  illjJ  +  1i^  +  GJ  ujj,  =  tj^  l^j^n2,   l^k^ny  (A.3.8) 

If  j^Hj  or  it^n3,  ^y^  is  tridiagonal  and  inverdble,  hence  each  Ujj^  can  be  generated  within 
5n;  operations.  Using  the  fact  that  G^  has  constant  sub-diagonal,  this  can  be  reduced  to  3n^ 
operations,  see  Appendix  A.1  for  more  detail.  If  j  =  n2  and  ik=n3,  by  (A. 3. 4),  we  have  to 
solve 

SiQirtQ:S{'u„^^^=t„^^^,  (A.3.9) 

which  requires  only  2nl  operations.  If  we  use  the  method  mentioned  in  Appendix  A.1,  this 

count  will  be  reduced  to  0(n  J  operations.   Thus  the  total  cost  of  generating  Uj^  for  all  j,k  is 

approximately 

3N  operations.  (A.3.10) 
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STEP  III.  Computation  (^Rq  T  u. 

Notice   that  RqT  u  =  R^il  ®  I  ®  S:iQi)  {I  ®  SiQj®  I)  P  u.    Let   v  -  P  «.    If  we 
partition  v  as  v  =  [v^  j^ .....  v^^  .....  v„  _„  1'  where  v^^  are  nj-vector,  then 

(ykj)t  =  («y;)*  istsn^,  l:Sy:Sn2,  l^/S/ij.  (A.3.11) 

Thus  this  step  can  be  done  without  any  arithmetic  cost.    Next  we  observe  that  since  R^  is 

sparse,  it  suffices  to  consider  the  cost  of  computing  E  T  u,  where  E  is  the  projection  operator 

onto/m(i?o)-   By  (2.5.3),  this  projection  operator  satisfies 

E  (/  ®  /  ®  ^3  Gj)  (/  ®  52  G2  ®  /)  =  (/  ®  /  ®  Si  Cj)  £  (/  ®  52  O2  ®  /)• 

Let  w  -  £  (7  ®  52  G2  ®  ')  v-  Since  ET  u  i  ImiRo),  we  have 

w  =  2^„^  ©  Hv  +  ^e„^  ®  ^w,  (A.3.12) 


where 


Vd„)j,\j,vjj^    l^j<n„ 


^j  =  \ 


and 


i-l 


0  /  =  n,.  (A.3.13) 


J  =  "i 


2h,    = 


'J  =  (.^)S'*7yjk  v„  ,,,  1  s  ;•  5  ^2.  (A.3.14) 


i-i 


Qearly  the  computation  of  Hv^  requires  A^  operations  whiJe  ^wj  requires  n]  rty  operations. 
Hnally  let 

r  -  £  Ao*  5  =  (/  ®  /  ®  53  G3)  w  =  \  ©  V  +  1*,^  ®  2r, 
where 

■'''*  =  ■SsQs-'*^*.  lst^«^j  =  l,2. 

This  step  requires  (n^  +  n2)n]  operations.  Thus  £  7  a  can  be  generated  in 

N  +  nlrty  +  (fli  +  n2)nl   operations.  (A-3.15) 
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Combining  (A.3.7),  (A.3.10)  and  (A.3.15),  we  conclude  that  the  product  E  Aq  (  can  be 
generated  in 

SN  +  2nl  n^  +  2(ni  +  nj)  nj  operations.  (A.3.16) 

When  n^=  n2  =  n^  this  count  is  IIN.  Since  Rq  is  sparse,  the  dominant  work  in  eadi  iteration 
is  the  computation  of  E  Aq  5-  Thus  we  see  that  the  work  per  iteration  is  approximately  IIN. 
We  remark  that  this  can  be  further  reduced  to  ION.  In  fact,  we  can  combine  the  cross  product 
in  the  first  term  in  (A.3.5)  and  the  cross  product  in  the  first  term  in  (A-3.12). 

We  note  that  it  is  not  necessary  to  generate  and  store  all  the  entries  of  t  and  u  before 
we  can  compute  ^*  r  and  T  u  respectively.  In  fart,  we  can  first  generate  r^^,  one  at  a  time, 
by  (A.3.5)  and  (A.3.6).  In  (A.3.5),  we  only  need  to  store  V  and  h,  which  requires  only 
("i  ■•■  "2  ~  1)''3  storage.  After  t^j  is  generate,  we  can  compute  u^j  by  (A.3.8)  or  (A.3.9), 
and  accumulate  the  result  of  T  u  to  the  corresponding  locations  before  we  generate  another 
ti^j.  More  precisely,  by  (A.3.11),  (A.3.13)  and  (A.3.14),  right  after  u^^  is  generated,  we  can 
update  the  /-entry  of  every  Vy  and  ^Wj  in  the  following  manner: 

(S)/  =  C'''.;  S'^Fn^jk  (vy;k),  =  ed„)  t\j.  ("kjh  1  s  ;  <  n„  1  S  /  S  »3, 

and 

e^j),  =  Odj)  i^^qjj,  {Vnjr  =  Cdj)  t'qjj.  («,;).,  1  ^  j  ^  n„  1  :S  I  ^  n,. 

i-i  '  i-i  ' 

It  is  clear  that  this  way  of  accumulating  the  results  will  not  increase  the  cost  of  computation, 
and  we  only  need  to  store  a  few  vertors  of  length  0(nf)  besides  the  Qi'i. 
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A.4        Fast  Fourier  Transform  for  Arbitrary  n 

In  5  2.3.1,  we  have  to  compute 

yk=^^j'  "   '  0^k<n.  (A.4.1) 

y-i 
In  this  section,  we  will  show  that  the  vector  y  ■  [  y^,  .  .  .  .  y„  ],  can  be  obtained  in  O(nlogn) 

operations. 

We  begin  by  rewriting  (A.4.1)  in  a  standard  form.  Thus  let  ry  ■  0  for  n  <  y  s  2n,  we 

have 

y*  =  2^y  *   ^'   .  0  =5  it  <  /I,  (A.4.2) 

y-i 

where  iV  «  2n.  Notice  that  if  we  set  Zj  -  z^\-j),  0^  j  ^  N-1,  then 

y-l         ..  2irlkj 

yk='Zij'~    ""'    .  OsJt</i.  (A-4.3) 

y-0 

Thus  without  loss  of  generality,  we  can  assume  that  we  are  using  formula  (A.4.3). 

If  n,  and  hence  N,  is  a  power  of  2,  then  (A.4.3)  can  be  computed  by  the  regular  Fast 
Fourier  Transform.  The  work  required  is  N  \ogt^  =  2n  logn  +  0(n),  see  Cooley  and  Tukey 
[12].  For  other  values  of  n,  we  need  the  foUowing  standard  results  in  Fourier  analysis.  The 
proof  can  be  found  in  Brigham  [9]. 

LEMMA  A.4.1 

For  any  A^-vectors  y  =  [yi,  •  •  •  ,  y\]  and  z  =  [  z^ Zy  ],  define 

(Fvz),-f,-^2'^y/  ^^-   ,  (A.4.4) 

'»      JmQ 

.V-1 

y-0 

{'■y)k''kyk,  (A-4.6) 

for  0  s  Jfc<  AT.  We  have, 
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.v-i       liK 
(1)  ^y  =  2  ^*  «    '■    .  OrsjKN. 


(A.4.7) 


i-O 


Thus  ^'W  Fv  is  a  unitary  matrix, 


(2)  Fs(zy)  =  i*y, 

(3)  Fsiz*y)  =  Niy.    Q 


(A.4.8) 
(A.4.9) 


By  (A.4.3),  for  0  sJt  <  iV,  we  have 


yt=Si;*-'  ='     ■'■    D^y'     •'■    *     ■'■      -'*'     •'■  .      (A.4.10) 

y-o  y-o 


where 


■«*  =  2  ^y  «     ^    '     ^' 


y-o 


(A.4.11) 

Notice  that  the  vector  y  can  be  obtained  in  0(n)  operations  once  the  vector  x  is  computed.  To 
compute  {  Ji  })i^i,  we  first  choose  an  A/  ^  2A^- 1  such  that  A/  is  a  power  of  2.  We  then  define 


and 


a,  = 


zje 


wtJl 
.V 


O-S  j  :S  N-1, 

Otherwise, 


(A.4.12) 


Py  = 


.V 


0 


j  =  M  -  l,0<  KN, 
Otherwise. 


We  note  that  by  (A4.12),  (A.4.11)  becomes 


(A.4.13) 


.v-i 


^k=  2ay3(i-y)m«<.v.  Orsk<N. 
y-o 

Let  /  -  *-;.  Since  0^k,j<N,-N<l<N.UO-sl<N,  then  I  <  M,  hence 


(A.4.14) 


P/«x/.v=  3/  =  3/««<Af.        forO=s/<^. 
If  0  <  -l<  N,  then  by  (A.4.13),  Pv-(-;)  =  &s-{-iy   Hence, 


(A4.15) 


P/««<,v  =  P,v+/  =  P«-(-/)  =  P.v-(-/)  =  P(«orf.v.        for   -N<1<0.     (A4.16) 
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Combining  (A.4.15)  and  (A.4.16),  (A.4,14)  gives 

-v-i 

^k=  2  "y  ■  ^(k-j)modM  =  (  a  *  3  )i. 
y-o 

where  o  =  [  oq,  .  .  .  ,  a^-i  ]  and  3  =  [  3o.  .  .  .  .  ^m-i  ]•  Hence  by  lemma  A.4.1, 

x  =  Fi;i'(M&-^)  =  Fi^'(MF^a-  F^  p  ). 
Thus  { Xjk  }jJC|  can  be  computed  at  the  expense  of  two  Fourier  Transforms  and  one 
inverse  Fourier  Transform.  This  requires  3A/  logA/  operations.  Notice  that  for  arbitrary  N, 
there  exists  an  M  such  that  2N-1  ^  Af  ^  4N  =  Sn  and  A/  is  a  power  of  2.  Thus  the  whole 
computation  requires  24n  logn  +  0(n)  operations. 
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